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ABSTRACT 
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by the Federal Bureau of Investigation (FBI) in the selection of 
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Linguist. The report is divided into eig>it sections. Section jl 
describes the need for the test, reviews the literature on the 
testing of translation ability, and discusses the development of 
translation skill level descriptions. Section 2 describes the 
multiple-choice and production sections of the SEVTE, scoring 
procedures and time limits. Sections 3 and 4 describe the 
development, trialing, and pilot testing. Section 5 describes the 
design and validation study, which includp-a members of the FBIr 
Houston Police Department, and professional translators. Section 6 
presents statistics on the scores of the subjects, and analyzes the 
reliability of each SEVTE section. Section 7 discusses content 
validity. Section 8 describes the equating of the two parallel forms, 
and the establishment of a cut score on the SEVTE multiple-choice 
section. Appended materials include sample test items, administration 
instructions, scoring guir -lines, the FBI/Center for Applied 
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other data collection instruments. (Author/VWL) 
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Abstract 



This document describes the development and validation of 
the Spanish - English Verbatim Translation Exam (SEVTE) for use 
by the Federal Bureau of Investigation (FBI) in the selection of 
applicants for the positions of Language Specialist or Contract 
Linguist. The report is divided into eight sections. Section 1 
describes the need for the test, reviews the literature on the 
testing of translation ability, and discusses the development of 
translation eikill level descriptions. Section 2 describes the 
multiple-choice and production sections of the SEVTE, scoring 
procedures and time limits. Section 3 and 4 describe its 
development, trialing and pilot testing on translation students 
at Georgetown University. Section 5 describes the design of the 
validation study, which included 44 employees of the FBI, members 
of the Houston Police Department, and professional translators. 
Section 6 presents descriptive statistics on the scores of the 
above subjects, and analyses the reliability of each SEVTE 
section using traditional methods and Generalizeability theory. 
The results indicate that the SEVTE is quite reliable for a test 
that involves free response items. Section 7, the longest of the 
report, begins with a discussion of content validity. Subsequent 
subsections discuss the evidence for construct, criterion- 
related, convergent and discriminant validity based on the 
results of the validation study. The results indicate that the 
two SEVTE constructs, Accuracy and Expression, are interrelated, 
but measure different dimensions of translation ability. Section 
8 describes the equating of the two parallel forms, and the 
establishment of a cut score on the SEVTE multiple-choice 
section, which can be used as a screening test. The 18 
appendices include sample test items, administration 
instructions, scoring guidelines, the FBI\CAL Translation Skill 
Level Descriptions, questionnaires and other data-collection 
instruments. 
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Abstract 

This report describes the development and validation of the 
Spanish - English Verbatim Translation Exan (SEVTE) • The SEVTE 
was developed by staff at the Center for Applied Linguistics 
(CAL) under contract with the Federal Bureau of Investigation 
(FBI). The SEVTE is designed to be a job relevant test of the 
ability to render a translation in English of a text written in 
Spanish. The report is divided into five sections, plus 
appendices. 

Section 1 provides an introduction to the project and 
establishes a framework for the project. This section describes 
the groups that would potentially be given the test, the survey 
of the types of documents the FBI needs to have translated, the 
development of ILR skill level descriptions for translation, the 
nature of translation, and the emergence of the two constructs of 
translation ability that are measured by the SEVTE. 

Section 2 provides a description of the test, which is 
divided into multiple choice and free response sections. The 
scoring of the test is also described and the computation of the 
total scores on two criteria. Accuracy and Expression, are 
discussed. 

Sections 3 and 4 describe the development and pilot testing 
of the SEVTE and the successive revisions it undervciit. 

Section 5 describes the validation study that was conducted 
on the final version of the test. It discusses the test 
adrainistration procedures, the sample, and the scoring of the 

5 



ERLC 



7 



tests* For this study, 66 examinees took both forms of the 

o£iVX£i« Tiie i»u£>jvCu» were rox xjail^t^a^e op«Cxaxx.»u» ^ o^tBCx.al 

Agents, and support staff, as veil as aembers of the Houston TX 
Police Department and employees of the Central Intelligence 
Agency . 

Section 6 presents descriptive statistics on test 
performance from the validation study as well as a detailed 
analysis of the reliability of the test. Reliability analyses 
include internal consistency, product moment correlations, and 
general izability coefficients. 

Section 7 presents the discussion of the validity of the 
exam. For this study, additional data was collected from 
employee files in the form of independent measures of proficiency 
in Spanish and English, and scores on an earlier generation of 
FBI translation tests. Subjects also completed a self -rating of 
the ability to translate various types of FBI documents. A 
number of statistical analyses were performed on the data. The 
results establish the validity of the constructs measured and 
support the validity of the SEVTE for the screening, selection, 
and placement of FBI applicants and staff in positions requiring 
Spanish • English translation ability. 

Section 8 of the report describes the development of a score 
conversion table, which can be used to convert scores on the 
SEVTE to an overall rating of translation proficiency on a 0 to 5 
scale. 

Eighteen appendices follow the body of the report. These 
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provide additional data and information relating to natters 
discussed in the text* 
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2^ Introduction 

This section of the report on the Spanish into English 
Verbatim Translation Exam (SEVTE) is intended to provide the 
reader with some appropriate background as a preliainary to a 
discussion of the test. 
1.1. Vmmi for the Test 

The Federal Bureau of Investigation (FBI) is the Federal 
Government's principal agency responsible for investigating 
violations of federal statutes. The overall objective of the FBI 
is to investigate criminal activity and civil matters in which 
the Federal Government has an interest, and to provide the 
Executive Branch with information relating to national security. 
FBI activities include investigations into organized crime^ 
white-collar crime, public corruption, financial crime, fraud 
against the Government, bribery, copyright matters, civil rights 
violations, bank robbery, extortion kidnaping, air piracy, 
terrorism, foreign counterintelligence, interstate criminal 
activity, fugitive and drug trafficking matters, and other 
violations of more than 260 federal statutes. 

In all of the above areas of jurisdictional responsibility, 
it is likely that the FBI could be called upon to investigate a 
large number of cases that involve languages other than English. 
Because of this, it is understandable that the FBI is 
increasingly called upon to provide Special Agents and other 
employees who are proficient in a foreign language. All modes of 
communicative skills may be required. That is, FBI staff may 
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need to be able to speak, understand, read or write the foreign 
language. They may also be required to provide oral 
interpretation or written translation. Often, they are called 
upon to provide a written summary in English of a foreign 
language conversation. 

The need to assess employees' or potential employees' 
language skills can be satisfied in a number of ways. To measure 
the speaking skill, the FBI has used the Interagnecy Language 
Roundtable (ILR) Oral Proficiency Interview for many years. To 
measure the listening and reading skills, the FBI uses the 
Listening and Reading sections of the Defense Language 
Proficiency Test (typically version II), (Walker, et al., 1988). 
These exams are taken by applicants for the position of Special 
Agent Linguist,* Language Specialist, and Contract Linguist. 

The FBI also has the need to measure the ability to provide 
a written English sum^nary of a non-English conversation. 
Frequently, this conversation involves a telephone communication 
that has been authorized by a magistrate as part of an ongoing 
criminal investigation. CAL developed the Listening Summary 
Translation Exam (LSTE) as part of its contract with the FBI.^ 



^Special Agent Linguists are Special Agents who are 
qualified to investigate crimes involving foreign languages. 

^The LSTE presents taped Spanish language conversations as 
stimuli and requires the examinee to answer multiple-choice 
questions or to provide a written summary as a response. The 
LSTE provides scores on the accuracy (including adequacy) of the 
information in the summary and on the quality of the English 
expression contained in the summary. 
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The development and validation ot the Ji>TE is the subject of a 
separate report (Stansfield, Scott & Kenyon, 1990a), and is not 
formally treated in this report. 

The FBI also has the need to XDeasure the ability to 
transla*"e written documents. Up until now, this need has been 
satisfied for some 20 languages through two parallel translation 
exams. Since these exams are secure instruments, CAL st&ff know 
nothing about them other than the fact that the FBI feels a need 
to develop new translation exams. Because of this, the FBI 
issued a request for proposals (RFP) to develop a completely new 
test of translation skills, which is the subject of this report 
and a companion report (Stansfield, Scott ( Kenyon, lS90b) . 
1.2. Intended Use 

The SEVTE is designed for use in the hiring of Language 
Specialists and Contract Linguists. Language Specialists are 
full time regular employees of the FBI, while Contract Linguists 
are self-employed and work on an hourly basis. The translating 
work of Language Specialists and Contract Linguists is primarily 
document-to-document or audio-to-document. The subject matter 
may be in any area in which the FBI has jurisdiction. As 
indicated on an FBI job announcement, an FBI Language Specialist 
is a full time employee whose duties are to "translate both 
recorded and written material, into English and vice versa, which 
involve a wide range of difficult subject matter containing 
technical or specialized terminology such as used in fields of 
law, politics, science, economics, and international exchange, as 
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well as nontechnical subject matter 

The SEVTE would be taken by civilians who are applying for 
these two categories of position, and by current FBI •mployees, 
such as support staff, who are seeking a promotion to the 
position of Language Specialist* 

According to the statement of work in the RFP, CAL is to 
provide a test that can measure translation ability at levels 2+ 
through 5. Such levels would be appropriat^/i for Language 
Specialists and Contract Ling^iists. SEVTE scores will provide 
supervisors with an indication of their suitability for a given 
work assignment involving Spanish to English translation* 
I.3. FBI Translation Keads Survey 

One of the first tasks r.-^i^^^rtaken during this project was 
the development of a questionnaire for the purpose of conducting 
a survey of the type of translation work required of Language 
Specialists in FBI field offices. It was hoped that this survey 
of the FBI's translation needs would be of help in determining an 
appropriate balance of topics and tasks for the tests to be 
developed. Thic questionnaire was developed by CAL staff during 
August 1988, and was subsequently revised by the FBI. Following 
these revisions, FBI Headquarters mailed two copies of the 
questionnaire to Language Specialists working in FBI field 
offices across the country. A total of 28 Language Specialists 
replied to the questionnaire. The questionnaire concerned 
translating from Spanish to English and from English to Spanish. 
The last page of the questionnaire was devoted to translating 
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from English to Spanish. A copy of the qpjestionnaire and the 
results are included in Appendix Q. The questionnaire required 
the Language Specialists to indicate the proportion of tine they 
spend translating each type of document listed in 1:he 
questionnaire. Unfortunately, the results of the questionnaire 
are limited, since, many individual's responses totaled more than 
100*. Still, the results of the questionnaire did provide 
supporting information for the development of the LSTE, the 
SEVTE, and the ESVTE. In general, the results indicated that 
Language Specialists spend more time doing listening tasks than 
translating written texts, particularly wonitoring and 
translating telephone and recorded conversations. They are also 
called upon to provide oral interpretations. 

More than half of the Language Specialists responding 
indicated they are often called upon to translate or summarize 
written material. The material these respondents most often deal 
with involves organized crime, narcotics, terrorism, and 
counterintelligence. 

The results of this survey were used to select topics for 
the written and recorded stimuli that appear on the three tests 
developed for this project. 

1.4. FBI\CXL Translation Skill Level Descriptions 
1.4.1. History 

Over trte years there have been a number of attempts by 
government agencies to develop skill level descriptions (SLD) for 
translation. None of these have been accepted outside of the 
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agency in which they were developed. The FBI also developed a 
set of translation SLDs a number of years ago. However, the 
Bureau was not satisfied with th^m. As a result^ the Statement 
of Work in the FBI^s Request for Proposals called for the 

development of new translation skill level descriptions (see 
Appendix R.) The statement of work also called for scores on the 
test to be convertible to the 0-5 ILR scale. As a result, CAL 
proposed to develop such skill level description^^ as part of this 
project. Once the project was funded, the first deliverable to 
be developed was the translation SLDs. These were needed to 
infoirm the test development process, and, in particular, to 
inform the scoring of the test and the conversion of the scores 
to the 0-5 scale. Thus, soon after notification of funding was 
received, CAL staff went to work on the skill level descriptions. 

In July 1988, CAL staff met with the project monitor and 
five FBI staff at FBI headquarters. Attending were FBI master 
translators.^ At this meeting it was agreed that, in order to 
help CAL begin the development of ILR skill level descriptions 
for translation, by the end of the month the FBI staff present 
would write a personal definition of what constitutes an 
excellent translator, a good translator, a mediocre translator, a 
poor translator, and a bad translator. It was agreed that CAL 
would use the descriptions of these five groups of translators as 
a point of departure for preparing skill level descriptions for 



^Language Specialists at FBI Headquarters in Washington, 
D.C. are referred to as Master Translators. 
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translation. Because FBI staff were familiar with the ILR SLDs, 

descriptions. The following description of a ••mediocre'* 
translator illustrates the kind of descriptions that were 
received. 

"Able to provide an understandable and fairly accurate 
translation of a larger number of texts, but still makes a number 
of mistranslations. Problems with spelling, grammar, and 
punctuation. Becomes lost when structure becomes complex or 
language mc *e sophisticated and has serious problems with slang, 
idioms and handwritten materials. 

The descriptions of different groups of translators provided 
by FBI staff, although brief and infomnal, were used as a 
starting point for writing skill level descriptions. 

CAL staff began by writing descriptions for level 5 
translation, and then worked down the scale to level 0+. The 
first set of skill level descriptions was drafted by Ana Maria 
Velasco, an experienced translator familiar with the ILR scale. 
She drafted the descriptions based on her experience evaluating 
the work of many different translators. In consultation with the 
project director, Ms. Velasco selected seven variables that 
should enter into the judgement or rating of a translation. 
These were accuracy, grammar (morphology), syntax (word order), 
style, tone, spelling, and punctuation. She placed these 
variables on the vertical axis of a scoring grid (matrix) . The 
horizontal axis contained 10 points on the ILR scale ranging from 
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04- to 5 • In each cell of the grid, she included a statement of 
the nature of translations at that level. Both skill level 
descriptions and a scoring grid were dev.-*loped, since it was 
thought that a scoring grid that separated 3ach translation 
variable by level and allows comparisons by variable across 
levels, would be helpful to raters. It was also recognized that 
the grid would be useful in the revision of the skill level 
descriptions for the £;axne reasons. That is, the description of 
ability on each relevant variable in the scoring grid could be 
consulted in the writing of the skill level descriptions. The 
final reason for producing the scoring grid was because we were 
unaware at the time which document, the grid or the skill level 
descriptions, could be used to score the test more reliably. 

The project director then reviewed the skill level 
descriptions and the scoring grid, making revisions where 
appropriate. His revisions were based on careful analysis of the 
wording of all the current ILR skill level descriptions, 
particularly the reading level descriptions. The revised SLDs 
and the scoring grid were then subject to careful review by 
Marijke Walter and her staff at the FBI. They responded to the 
draft descriptions based on their experience v.valuating the 
translation*^ of Language Specialists and applicants for 
employment as a Language Specialist. After receiving a set of 
comments from Ms. Walker, CAL revised both documents. A major 
revision to occur at this point, at the suggestion of Ms. Walker, 
was the inclusion of syntax within grammar on the scoring grid 
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and the addition of vocabulary to the grid. (A copy of the grid 
is '.ncluded in Appendix I as Exhibit A*) Another substantive 
revision was a change in the percentage correct criteria tor 
punctuation and spelling at level 5. It vaa decided that for 
purposes of the grid, the translation need not be absolutely 
perfect in spelling in order to be at level 5. A brief 
description of the kinds of documents that can typically be 
handled by a translator at each level was included. 

On December 5, 1988, a meeting was held at FBI Headquarters 
to review the revised set of translation SLDs. Present at the 
meeting were Charles W. Stansfield and Ana Maria Velasco from 
CAL, Marijke Walker and her staff, Thomas Parry from the Central 
Intelligence Agency, and James Child from the Department of 
Defense. During this meeting it was noted that the draft 
translation SLDs describe the characteristics of the translated 
docuroent, while ILR SLDs for other modes of communication 
describe the skills of the person being evaluated. It was 
suggested that the Translation SLDs should consistently describe 
the translator, rather than the translated document. It was also 
agreed to introduce this current draft of the descriptions to the 
ILR Testing Committee before making any revisions, and to ask 
committee members for written comments regarding how the draft 
can be improved. 

These translation SLDs were the subject of a brief 
discussion at the December meeting of the ILR Testing Committee 
two days later. Members of the committee were given a 



(juestionnaire concerning the SLl^s to complete and nail to CAL 
(see Appendix I, Exhibit B) . Unfortunately, no questionnaires 
were returned. The committee net again in February, 1989, with 
essentially the same outcome. While general and conceptual 
concerns were expressed at the meeting about the SLDs, only three 
specific suggestions for improvement were made. These 
suqgestions were a.) to change the descriptions so that they 
referred to the translator rather than to the translation, as 
suggested earlier, b.) to use the tera "to render** when referring 
to the act of translating, and c.) to reorder the descriptions so 
that they begin with level 0 and progress to level 5. 

Following this meeting, Charles Stansfield and Marijke 
Walker worked jointly on several occasions to improve the SLDs. 
The ILR Testing Committee met again on March 8, 1989, to consider 
the next revision. At this meeting it was not possible to obtain 
organized and coherent feedback or approval of the descriptions. 
Thus, CAL and the FBI agreed subsequently that the level 
descriptions being developed for this project would be ised by 
the FBI, and that they would be available to the ILR for use as 
interim SLDs until such time as the ILR Testing Committee has 
time to consider and revise them further. Subsequently, 
Stansfield and Walker met again to make additional revisions on 
the SLDs. These revisions included the incorporation of some of 
the wording used in the previous set of translation SLDs used by 
the FBI. The task of developing and revising the translation 
SLDs was completed in June, 1989. No further work was done on 
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them for seven months. 

The Verbatim Translation Exams that CAL developed for the 
FBI were administered during the months of November and December 
1989. After scoring the Listening Summary Translation Exam , CAL 
staff and consultants then scored the production portions of the 
verbatim translation exams. Soon it became apparent that there 
were limitations in the ability of the SLDs to describe all 
examinees. The problem seemed to lie in the fact that some 
examinees were translating into their native language and some 
into a second language. In the case of a number of examinees, 
there was a considerable discrepancy in the proficiency in the 
two languages. Examinees who were translating into their native 
language, especially English, produced translations that were 
very fluent and grammatical, but ineccurate in terms of content. 
Similarly, when translating into the second language, some 
examinees produced accurate translations that evidenced problems 
with grammar or vocabulary. As a result, on January 30, 1990, 
Stansfield and Scott sent a memo to Marijke Walker at the FBI in 
which they recommended that the current SLDs be divided into two 
parts: one for Accuracy and one for Expression, and that 
separate scores be assigned for each. CAL also recommended that 
the discussion of the kinds Cl documents a translator at a given 
proficiency level can handle be deleted from the SLDs, since the 
verbatim exams did not provide the opportunity to examinees to 
translate all of the types of documents mentioned. The FBI 
agreed to this change. It is most significant that the results 
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of the validation study supported this division of translation 
abilities. 

The current version of the SLDs is basically the sane as the 
one that was used to score the verbatim translation exasis* 
However, after the scoring of the test was completed, we realized 
that the discussion of the kinds of documents a translator at a 
given proficiency level can seccessfully render is useful 
interpretive infomnation for test score users/ Therefore, the 
version of the SLDs included in this report, presents this 
discussion following the SLDs for Accuracy and Expression. It 
ohould be remembered however, that the raters of the SEVTE did 
not use this intejrpretive information when scoring the responses 
of examinees who participated in the validation study. 
1.4.2. Explanation of the Skill Level Descriptions 

The FBI\CAL translation SLDs are divided into three parts. 
The first part is the Accuracy description. Accuracy is the 
ability to correctly convey the information in the source 
document. The second part of the description is the Expression 
description. This describes the examinee's command of the 
written form of the target language. The third part of the 
translation skill level descriptions is the interpretive 
information. This is a sentence describing the g3neral ability 
level of the examinee and the types of documents that he or she 



*It should bO; pointed out that there is no empirical data, 
in the form of a criterion-related or predictive validity study, 
to support this interpretative information. 
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can be expected to translate successfully. 

Because an examinee »ay be called on to translate into his 
or her native language or second language, it was necessary to 
separate the ratings for Accuracy and Expression. By evaluating 
Accuracy and Expression separately, the level descriptions can be 
used to characterize an examinee whose translation is accurate 
but may evidence some problems with grammar or vocabulary. 
Otherwise, two different examinees might receive the same score 
by a rater who is attempting to compensate for either lack of 
Accuracy in the infomnation conveyed or lack of grammatical ity in 
the translation. A personnel administrator trying to make a 
decision on hiring would not have sufficient information from a 
score combining Accuracy and Expr^^^ssion to make an informed 
decision. This is because a typical profile of a level 2 
(Accuracy) translator when translating into his or her native 
language, may be a level 4 in Expression but only a level 2 in 
Accuracy. Such an individual could not handle the kind of 
documents mentioned in the ILR reading descriptions for Level 3 
or those mentioned in the interpretive information for level 3 of 
the translation SLDs. On the other hand, with separate scores 
available for Accuracy and Expression, an administrator would be 
able to make a decision to hire an examinee whose translations 
would be accurate though unpolished. 

The three parts of the translation SLDs, unlike the SLDs for 
listening, speaking, reading and writing, must be in separate 
sections. This is because translation involves two languages, 
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and the examinee's ability in each language aay not be equal. 

The first part of the SLDs ir the Accuracy description. The 
Accuracy description focuses on whether the infomation contained 
in the source document is distorted or lost in the translation, 
or whether infoxrmation has been inserted in the translation that 
was not in the source document. In the field of translation, 
such problems are referred to as mistranslation, omission, or 
addition. Scoring a translation for Accuracy requires comparing 
it with the original. The Accuracy descriptions refer to the 
ability to sustain performance (to render the document into the 
target language successfully) over a wide variety of documents 
varying in tyx)e and difficulty, rather than a single document. 
In general, Accuracy is the principal ability being measured in a 
test of translation. Thus, the Accuracy rating is the principal 
rating of the examinee's ability to translate. 

Again, it must be remembered that this rating is descriptive 
of the ability to translate a wide variety of documents . A level 
three translator may translate a level 1 document perfectly, thus 
making it appear to be a level 5 translation. Similarly, the 
same translator given a level 5 document may produce a 
translation that appears to be less than level 3. 

Because the accuracy of a translation may vary according to 
the difficulty of the document being translated, the developer of 
translation skill levels faces a dilemma. It is necessary to 
choose a type of document or level of document (in teirms of 
difficulty and complexity) on which to base the Accuracy 
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descriptions. In this case, ve chose to describe Accuracy in 
rendering a hypothetical ^average^ or typical docuaent. An 
average document encountered by an FBI Language Specialist, in 
terns of difficulty, would be one at level 3 or Bostly at level 
3, which would make it a 2+. As the translator noves above level 
3 in akllity, he or she, by definition, can handle documents of 
above average difficulty. That is, he or she can handle 
documents at level 3+, 4, or even higher. The Accuracy 
description nicely represents both the translation ability level 
of the examinee and the level of task or document that the 
examinee can han'iie adequately. 

The seconrt part of the skill level descriptions is the 
Expression description. Expression involves all the linguistic 
variables apparent in a translated document except Accuracy. 
These variables are grammar, syntax, vocabulary, style, tone, 
spelling, and punctuation. In general, it is possible to score a 
translation for most of these variables without referring to the 
sourcfci document. However, it will sometimes be necessary, 
especially in the case of higher level documents, to compare the 
source document with the translated document, particularly if the 
style and tone of the translated document are to be evaluated. 

The discussion of the type of documents a person can handle 
that initiates each SLD for the other skills is not truly part of 
the translation scale. It is merely score interpretation 
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information that is of interest to score users/ 

When using the interpretive infonnation, a score user should 
remember that it refers to the type of documents that an examinee 
can handle successfully. Efforts to translate more sophisticated 
documents than those associated with that level or lower levels, 
will result in less than adequate translations. 



*If the information on the type of documents a translator 
can handle were to be incorporated into the translation SLDs, 
then a rater would have to administer the documents mentioned to 
an examinee in order to verify that the statement is correct. 
This would require some type of tailored face-to-face testing. 
That is, the test administrator would have to select and 
administer a document to the examinee. Then, the test 
administrator would have to wait for the examinee to render a 
written translation of the document. Once the rater received the 
document, it would have to be scored immediately. Then, the test 
administrator would have to select another document, associated 
with a higher or lower level on the scale, and administer it to 
the examinee, and continue the process again until the rater was 
satisfied that he or she had identified the highest level of 
document that the examinee is able to translate faithfully. To 
do this, would require a full day to test each examinee, which is 
impractical for reasons of cost. Thus, the interpretive 
information in the translation SLDs is not of interest to raters 
of translated documents. 

Another theoretical possibility involving tailored testing 
would be to let a computer select, administer, and score the 
translation using the skill level de5;criptions as a basis for 
scoring. While a computer could sel'act a document of 
predeteirmined difficulty, and administer it to the axaminee, and 
the examinee could key-enter a translation of the document on the 
computer screen, it is not yet feasible for a computer to score a 
translation using even an analytic scale, and it is doubtful that 
a computer will be able to use a holistic scale (such as the 
SLDs) for many years to come. Thus, it is not possible to 
develop a tailored test of translation ability at this time. 
Other ILR SLDs, such as those for speaking and reading, assume 
that tailored face-to-face testing is possible. Thus, the 
inclusion in the other ILR SLDs of the type of documents or tasks 
that can be handled is more logical. It is not logical to 
include them as an integral part of the Translation SLDs. 
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1.5. Th% Katur# of Translation JUbility 
i«5«i« Tum mmm& to Dafina ina Construct 

Bachman (1990, p. 251), citing Upshur, distinguishes between 
viewing a test score as a pragnatic ascription (the individual is 
able to perfonn a task) , versus viewing a test score as a measure 
of some human construct (the individual has a certain ability) . 
Bachman notes that there is often confusion between the 
measurement of the activity and the measurement of the construct 
and the processes that underlie it. Indeed , he notes that the 
activity is often confused with the construct and vice versa. 

Bachman 's characterization of this confusion regarding 
validity is somewhat analagous to the dilemma we encountered when 
we wrote our proposal to do this project in September 1987. In 
this case, we started with products (translations) , and in the 
process of developing the test, we identified the constructs 
involved in the measurement of translation ability. We learned 
that translation ability is most appropriately expressed through 
two main constructs, accuracy and expression. 

It is important to distinguish between translation ability 
as a measurement construct and translation ability as a 
psychological construct. A measurement construct is one that 
holds up under statistical analysis, such as factor analysis or 
other appropriate procedures. It should be supported by 
descriptions of the psychological construct, which refers to the 
mental operations and processes involved. Neither the 
measurement construct nor the psychological construct was 
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understood at the start of this study. Thus, we entered the 
study fully avar^ that we were sailing uncharted waters. While 
hopeful that we would nake some discoveries, ve were fully aware 
that any test we constructed night not stand up to scientific 
analysis. Thus, we were aware that we night fail in our effort 
to construct a reliable and valid test of translation ability. 

In terms .f a psychological construct, we identify 
translation ability as a nexus of psychological and linguistic 
knowledge, skills and abilities that can be combined with real 
world knowledge to produce a translated docunent. This is an 
initial definition of translation as a process; it is in no sense 
a description of the process. At present, there is almost no 
understanding of the translation process. Moreover ^ the level of 
ignorance about translation is exacerbated by the fact that many 
translators h<ive written about it and their writings create the 
impression chat a literature on the process exists and, 
therefore, that the process is at least partly understood. 
1.5.2. The Literature on Translation 

The writing of translators about translation has focused on 
the best approach to translation.* Two main approaches have 
characterized the discussion These are literal translation and 
free translation. Those who espouse a literal translation strive 
to be faithful to the language of the source document, while 



'Because the literature on translation was largely unhelpful 
ai.J did not inform this test, we have not attempted to include a 
formal review of the literature here. Instead, we will give only 
a brief summary. 
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those who espouse a free translation strive to produce a similar 
rhetorical effect as the source document. Thus, it ean be seen 
that academic discussions of translation center on the subject of 
equivalence. That is, how does one procmce a target document 
that is equivalent to the source document.^ 

A discussion of this nature is far from a scientific 
discussion. Indeed, almost everyone who writes about translation 
appears to be unaware that translation is an ability that can be 
the subject of scientific inquiry. Moreover, when the 
possibility of developing a scientific knowledge base about 
translation is raised, it is quickly dismissed. In regards to 
this possibility, Nevmark, who is probably the best known of 
those who write about translation, has stated: "There is no such 
thing as a science of translation, and there never will be" 
(1981, p. 113) . 

Apart from the questions of approach and equivalence, there 
is also some literature on the nature of a good translation, 
which might appear to be relevant to the measurement of 
translation ability. In a portion of this literature, 
translators usually describe some problems they encountered in 
translating specific documents. Another portion of this 
literature discusses the characteristics of a good translator or 
translation. The characteristics are usually stated in the form 



'Recently, there has been some attention to the role of text 
characteristics in determining the approach to use. For a 
summary of the rhetoric on equivalence and on the role of text 
characteristics, see Pochhacker (198S) . 
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of ascriptions, i^e,, is sensitive to the nuances of vords in 
both languages, is sensitive to style, tone and purpose. Such 
ascriptions do not help us to understand translation as a 
psycholinguistic process or even the appropriate constructs to 
measure • 

Some authors have noted that there are certain prerequisites 
to being a translator. Apart from the attitudinal 
characteristics, such as a love of language, most notable among 
these are a knowledge of the language of the source document, a 
knowledge of the lanr/uage on the target document, and some 
knowledge of the subject/ Again, this information, while 
accurate, was not helpful to us in developing a test of 
translation ability/ 

The Emergence of the Constructs 

In this study, we identified Accuracy and expression as the 
measurement constructs of relevance. We define Accuracy as the 
ability to render the information or propositions in the source 
document into the target document without mistranslations, 
additions, or deletions. We define E>cpression *>s the ability to 



•Knowledge of the subject is viewed as being less important, 
since it is considered that one can learn this quite easily by 
reading on the subject prior to beginning the translation. It is 
interesting to note that we did not encounter a single mention of 
"schema theory" in writings on translation. 

*At the start of the study, we did a computer assited search 
of the ERIC database, using "translation" and ••language testing" 
as major descriptors. The seven titles this search produced 
dealt with translation as a method for testing language 
proficiency or achievement. Not a single one dealt with the 
measurement of translation ability per se. 



express oneself appropriately in the target language in the 
context of a translation. 

We could not identify these constructs at the start of the 
project. Instead, they emerged slowly as the project progressed. 
As indicated in section 1.4., the first task in this project was 
the development of skill level descriptions (SLDs) . These SLDs 
combined statements referring to Accuracy, to categories of 
expression, and to the type of documents a translator can handle. 
The SLDs were written so that they could be used in some way when 
scoring the test or referenced when interpreting the test score. 
Once the descriptions were drafted, we began developing the 
tests. 

The process of scoring trial tests and pilot tests provided 
us with more experience in the measurement of translation. For 
instance, pilot testing taught us that people performed much 
better when translating into their native language. Thus, we 
learned that a single set of skill level descriptions could not 
be used to characterize translation ability in both directions. 
For the sake of parsimony, we had initially hoped that it would 
be possible to characterize a translator through a single 
proficiency rating that would indicate his or her ability to 
translate in both directions; that is, from native language to 
target language and from target language to native language. 
While this may seem naive in retrospect^ at the time we were 
influenced by the elimination of the distinction between native 
languages and second languages in linguistics (see Ka'rhru, 1985), 
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since proficiency in either can range from almost none to 
distinguished. Thus, ve were not willing to accept the 
recommendation that separate sets of SLDs be developed for 
translatinq in each direction. Since ve believed a single set of 
SLOs would be adequate, we also believed that a single rating 
could characterize translation ability in both directions, and 
that separate ratings for each direction were not necessary. The 
experience of scoring pilot tests which were given in both 
directions made us doubt this assumption and in the ensuihg 
months we abandoned the idea entirely. Still, we believed, and 
we continue to believe, that the same set of SLDs can be used for 
both directions, and that the development of a separate sot of 
SLDs for translating to the native language and another for 
translating to the second language is unwise." Thus, we began 
the project believing that a single hoJ'istic score could 
represent translation ability, and by the end of the pilot 
testing we had modified our ideas so that we now believed that 
two scores, one for translating in each direction, would be 
necessary. 

At this point another experience began to influence our 
ideas. During the fall of 1989, we administered, scored, and 
analyzed the Listening Summary Translation Exam . This test, 
which is the subject of another report (Stansfield et al., 
1990a), produced two scores, one for Accuracy and one for 



"a number of government translators advised us to do this. 



Expression* A separate score for Expression had always been 
considered for this test, since we were aware that errors in 
English writing ability have posed a problem for the FBI when 
translations oral conversations are introduced in court. That 
is, even if a translation is accurate, if it is written poorly, 
the credibility of the information it contains becomes tainted. 

The analysis of the LSTE shovcjd the validity of the Accuracy 
rating in temns of ics correlation with other measures of 
proficiency in the language of the auditory stimuli. The 
analysis also showed Expression to be an entity different from 
and often unrelated to Accuracy. As a result, we concluded that 
Accuracy is the principal trait to be measured in a test of 
listening summary writing ability, but that it may also be useful 
to have an expression score in order to identify examinees whose 
work may need to be reviewed before being used in a legal 
proceeding. 

As indicated in section 1.4.1., soon after scoring the LSTE, 
we began scoring the SEVTE and a parallel test in the opposite 
direction, the English - Spanish Verbatim Translation Exam 
(ESVTE) . We soon realized that it would not be possible to use 
the SLDs to score the paragraph translation portion of these 
tests since the performance on the criteria relating Accuracy was 
often incongruous with the perfoniiance on the criteria relating 
to Expression. At that point, it became apparent that the 
solution to this problem lay in considering Accuracy and 
Expression as separate constxnacts and assigning separate scores 



to each« This decision to divide translation ability into two 
constructs is supported by the Bany analyses reported in the 
section on validity of this repoz-t* Thus, vh^le ve began this 
project believing that translation ability in both directions 
could possibly be represented in a single rating, ve ended the 
project having learned that four scores are necessary to 
represent translation ability, i*e*, two for each direction* 
These scores do not describe the psychological construct or 
ability, but they do identify and define the measurement 
constructs. 

In order to gain an understanding of the psychological 
construct, psychologists and applied linguists will have to turn 
their attention to the process of translation* A description of 
these processes is essential to understanding the construct of 
translation ability. 

Due to the lack of relevant research on translation, this 
project was begun without an understanding cf the construct to he 
measured. We ended the project without an understanding of the 
process of translation, but with the belief that we at least 
subdivided the construct in a practical way so that instrxuoents 
can be developed to measure it. We believe the instrument 
described in the remaining sections of this report is a good one. 
However, in the coming decades other researchers will develop 
other instruments that may have greater reliability, due to 
improved scoring procedures, or greater validity, due to a better 
understanding of the psycholinguistic processes involved in 
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translation. Nevertheless, it is likely that high quality 
instruments to xoeasure translation ability will continue to focus 
on the constructs of Accuracy and Expression that Merged from 
this project. Thus, at this point, for the purpose of 
measurement, we believe it is possible to define the construct of 
translation as the ability to render accurately content 
information from a source language text to a target language text 
and the ability to express this information using appropriate 
target language grammar, syntax, vocabulary, mechanics, style, 
and tone. 



36 



2* o#n#ral D#0criptioD 

The Spanish into English Verbatiin Translation Exam (SEVTE) 
is designed to assess the ability to render a verbatim 
translation in English o? source material written in Spanish. 

The SEVTE consists of two subtests. The first, referred to 
in this part of the report as the Multiple Choice section, 
consists of embedded phrase translation and error detection 
items. The second subtest, referred to as the Production 
section, requires translation of embedded phrases, sentences, and 
paragraphs. A separate test booklet, containing instructions, 
examples, and test items, is provided for each subtest. There 
are two forms of the SEVTE; they are generally parallel in 
content, item difficulty, format, and length. 
2.1. Multiple Choice Section 

This section of the report describes the format, and test 
taking and scoring procedures for the Multiple Choice section of 
the SEVTE. 
2.1.1. Format 

There are 60 items in the Multiple Choice section: 35 are 
Words and Phrases in Context (WPC) items, and 25 ar«^ Error 
Detection (ED) items. In a WPC item, an examinee is required to 
select the best translation of an underlined word or phrase 
within a sentence. In an ED item, an examinee must identify 
where an error is located within the sentence, or indicate that 
there is no error. ED items are written in the target language 
only; errors may consist of incorrect grammar, word order, 
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vocabulary, punctuation, or spelling. (There is no more than one 
error per item, ) 

The multiple choice items are designed to test specific 
grammar points such as subject-verb agreement, verb tense 
(preterit vs. imperfect, subjunctive, etc.) , pronouns, 
prepositions, gender, or word order; or vocabulary, including 
noun, verb, adverbial, and adjectival phrases, and false 
cognates. The results of a content analysis" of the SEVTE 
Multiple Choice sections are displayed in Appendix D. Briefly, 
30-32% of the items assess knowledge of grammar, 60% assess 
knowledge of vocabulary, 8% assess knowledge of mechanics 
(spelling or punctuation), while 5% of the items contain no 
error. " 

The test booklet contains instructions, example items for 
each subsection (WPS and ED), explanations of the example items, 
and the test items. Appendix B contains selected portions of a 
test booklet for the Multiple Choice section, including the cover 
page, instxnactions, and example items. This appendix can be used 
by the FBI to construct an examinee handbook. 
2.1«2. Test Taking 

Each examinee receives a Multiple Choice section test 
booklet, a machine scoreable answer sheet, and two no. 2 pencils. 



'*The content analysis of test was carried out by CAL staff 
arj then verified by FBI Headquarters staff. 

^^Some of the items test knowledge of more than one aspect 
of language. 
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Examinees listen as the test supervisor gives instructions for 

^ « 1 1 i. ^••^ ^ W -M. -2 ^ MMA«>ji^»Wl ^ .M.^* •M.M ««%4<9 4*Wa 

JL Jk J. i. J. I i<j wwiw wiiic MtawssjLiiv owwa^ cciii/x v mavwJL wiiwsw »wnv wow 

>^ooklet cover page. Subsequently, they are given 35 »inutes to 
complete the Multiple Choice section. 
2. 1.3. Scoring Procedures 

Examinees record their responses to the Multiple Choice 
section of the SEVTE on answer sheets which are scored by 
machine. The score on this section is the number of answers 
correct. The maximum possible score is 60. 
2.2. Production Section 

This section of the report describes the format of the 
Production section as well as test taking and scoring procedures. 
2.2.1. Format 

There are 28 production items on each exam form; 15 items, 
called Word or Phrase Translation (WPT) , require translation of 
underlined words or phrases in sentences, 10 items, called 
Sentence Translation (ST) , require translation of complete 
sentences, and three items, called Paragraph Translation (PT) , 
require translation of entire paragraphs." 

The test booklet contains instructions^, an example of each 
item type (except for the paragraphs), a brief discussion of each 
example item, and the test items. Space is provided in the 
booklet for the examinee to write the translation below each 
item. Appendix C contains selected portions of a test booklet 



**The paragraphs on the SE\TE forms range from 87 to 121 
words in length, averaging 99 words per paragraph. 
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for the Production section, including the cover page, 
instructions, and example items. The reader may find it helpful 
to refer to these now in order to get a better understanding of 
the nature of the SEVTE» 
2.2.2. Test Taking 

Examinees are given 35 minutes to complete the first two 
subsections (WPT and ST) and 48 minutes to complete the paragraph 
subsection. They are permitted to use dictionaries only in 
translating the paragraphs. 
2.2.2. Scoring 

As noted ?ibove, examinees write their translations in the 
test booklet. Each subsection is scored by a trained rater 
according to the procedures outlined below. 
2.2*2.1. Words or Phrases in Sentences Items 

The keys for this subsection are quite comprehensive, 
containing a number of acceptable translations for each item. 
However, when scoring the test a rater is free chose to accept 
other appropriate translations that are not included in the key 
if he or she believes that translation is correct. The items are 
scored as either correct or incorrect, regardless of whether an 
error consists of incorrect grammar, word choice, or syntax. One 
point is awarded for each correct translation; hence, the maximum 
score for this subsection is 15 points. 
2 .2 .3 •2. Sentence Translation Items 

The keys for this subsection contain several acceptable 
translations for each item, although the keys do not purport to 
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list all possible acceptable translations. A trained rater 
assesses the Accuracy of the translations, i.e., the extent to 
which the original meaning has been appropriately conveyed. From 
0 to 5 points are awarded for the translatilon of each sentence, 

according to the scoring guidelines found in Appendix E. As 
there are 10 sentences, a maximum of 50 points arc possible for 
this subsection. 

2.2.3.3. Paragraph Translation Items 

The keys for this subsection provide only one translation 
for each paragraph, even though a number of slightly different 
but acceptable versions are possible. The example translation is 
intended to provide a standard interpretation of the source text, 
and raters may use their expertise in the language to judge 
whether variations in examinee renditions remain faithful to the 
original meaning* On the other hand, the rater training 
materials provide several examples of translations at different 
ability levels, along with appropriate scores for each 
translation. 

Examinee translations are evaluated for correctness of 
Grammar (morphology), Expression** (in the case of the paragraph 
translation items only. Expression refers to word order and 
vocabulary) , Mechanics (spelling and punctuation) , and Accuracy 
(as described above) . From 0-5 points are awarded in each 

^*The reader is advised not to confuse paragraph expression 
with the overall Expression score. The overall Expression score 
includes all criteria referred to in the SLDs other than 
Accuracy. 
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category according to the guidelines located in Appendix F. 
Since there are three Paragraph Translation items, a total of 60 
points are possible for this siibsection; 15 points for Accuracy 
and 45 for Expression. 
2»3. Computation of Total scores 

A total score is computed separately for Accuracy and 
Expression, (See the discussion of these constructs in section 
1,5 •a) A oaxioum score of 185 points (80 for Accuracy and 105 
for Expression) is possible for the entire exam. The total for 
Accuracy and Expression is then converted to a Translation 
proficiency rating (one of the new CAL/FBI Skill Level 
Descriptions) using the conversion tables (one for each exam 
form} found in Appendix O, The development of these conversion 
tables is described in section 6,3 of this report. 

The total score for Expression is composed of the 60 items 
in the Multiple Choice section, which are worth up to 60 points, 
plus the sum of the points earned for Grammar, Expression, and 
Mechanics (up to 45 posr.ible) on the Paragraph Translation 
subsection of the Production section. Thus, the examinee may 
obtain a raw score of up to 105 points for Expression. 

The total score for Accuracy is composed of the 80 points 
that may be earned on the Production section. The examinee may 
earn 15 points for Accuracy in the Word and Phrase Translation 
items, 50 points for Accuracy in the Sentence Translation items 
(up to 5 points for each of 10 sentences) and 15 points for 
Accuracy on the three paragraphs (up to five points per 
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paragraph) 

2 •4. of Multiple eboica section for 0cr««ning 

The Multiple Choice section nay be used to screen out 
individuals for whom the Production section of the exas would be 
inappropriate. Since the nininuxD recoianended passing score is 
2.8 or a 2+ on ^he Translation Skill Level Descriptions, 
examinees should not be screened out who have some reasonable 
chance at scoring at this level. Prior FBI policy has 
established a 2.0 as a screen (previously based on a DLPT reading 
score) , and CAL was requested to continue this practice by v sing 
the Multiple Choice section score corresponding to a 2.0 on the 
entire SEVTE as a screen. Through statistical analyses 
(described in section 8.4), we have determined that the raw score 
cut--off on the Multiple Choice section should be 13 for Form 1 
and 25 for Form 2. Examinees scoring at or below these scores 
need not take the Production section of the SEVTE, since they are 
unlikely to have a translation skilx level at 2.8 or above when 
the entire exam is administered. If they have already taken the 
Production section, it need not be scored. 



^^As explained later in this report, a multiple regression 
analysis did not improve on this raw score weighting. Thus, it 
was decided to use this weighting to calculate the total score 
for Accuracy. The effect of this weighting is that the Sentence 
Translation subsection counts more than three times as much as 
the Paragraphs subsection. 
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3. D«v«lopaMt of thm 8EVT2 

This section describes the development of the two pilot 
forms of the SEVTE. The preparation of examination materials and 
the development of pilot study scoring methods are also 

discussed. 

3.I. Exam Forms 

Items for the SEVTE were developed by CAL staff and 
consultants, taking into account the results of the survey of FBI 
translation needs (see section 1.3), the results of which are 
reported in Appendix Q of this report. They relied on their 
expertise as translators and teachers in developing the items. 
The item developers sought to test aspects of Spanish tha are 
especially challenging to translate because there is no direct 
eqiiivalent in English. The developers also focused on aspects of 
grammar that have traditionally caused problems for 
Spanish/English translators and students because there is no 
direct correspondence between the two languages. These areas 
include pronouns, verb tenses and secpaence of verb tenser, use of 
negatives, possessives, prepositions, and non-temporal verb forms 
(infinitives, gerunds, past participles), among others. 

A number of item texts were either excerpted directly from 
documents provided by the FBI or were paraphrases of such 
documents. In addition, many items vere paraphrased from 
newspaper and magazine articles and documents encountered in the 
professional work of the item developers. The developers 
selected the material carefully, so that the topics and 
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vocabulary of the item texts would be consistent with the type of 
documents FBI employees reported being required to translate on 
the survey of FBI translation needs. 

Parallel fonss were organized by Batching items according to 
point being tested (specific grammar point or vocabulary) and by 
matching them in terms of difficulty on the FBI/CAL SLDs for 
translation. This latter matching required the test developers 
to make an estimate of the difficulty of rendering the 
translation, rather than of the difficulty of the language of the 
item itself in either the source or target language. The items 
were originally arranged in order of increasing difficulty. More 
items were developed than we anticipated would be needed on the 
final forms, so that items that did not function effectively 
could be discarded after pilot testing. Originally, there were 
63 items (35 Words or Phrases in Context and 28 Error Detection) 
in the Multiple Choice section of Fona 1, and 64 items (35 Words 
or Phrases in Context and 29 Error Detection) in the the Multiple 
Choice section of Form 2. The Production sections of both foinns 
contained 23 Word or Phrase Translation items, 16 Sentence 
Translation items, and three Paragraph Translation items. 

Following extensive internal review, CAL sent the SEVTE exam 
forms to the FBI for preliminary approval and revised them 
according to FBI suggestions prior to trialing. 
3.2. Pilot Test Scoring Procedures 

Answer keys were prepared for the Multiple Choice and 
Production sections. The )ceys were reviewed by FBI staff 
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members, and a number of their suggestions were incorporated in 
making revisions. 

Examinee responses to the Multiple Choice section were to be 
scored by an optical scanner, which would tabulate the niunber of 
correct answers. Similarly, examinee translations of the Word or 
Phrase Translation items in the Production section were to be 
scored by raters as being either correct or incorrect, according 
to the keys which had been prepared. 

In contrast, scoring of the Sentence Translations and 
Paragraph Translations was to be based on the new FBI/CAL 
Translation Skill Level Descriptions. The Translation Skill 
Level Descriptions were intended to characterize an examinee's 
performance on a range of materials. Thus, it was not possible 
to use them to score individual sentence items because these item 
texts were too restricted. Consequently, CAL staff developed 
simplified scoring guidelines, based on the FBI/CAL translation 
skill level descriptions, for evaluating both ST and PT items. 

In preparation for writing the sir^plified guidelines, the 
FBI/CAL skill level descriptions were reorganized so that all 
proficiency levels were described within each category, i.e. 
Grammar, Syntax, Vocabulary, Mechanics, Accuracy, and ''tyle and 
Tone. (For example, references to grammar in levels 0+ - 5 were 
all placed on the same page.) 

After studying these reorganized skill level deiicriptions, 
an attempt was made to characterize each level succinctly within 
each category. The plus levels were eliminated, so that the 

46 

ERXC 



scale consisted of 0 5 points in each category. Because exam 
texts were based primarily on legal and business documents (i.e., 
formal %a:iting) , which did not vary much in terms of Style and 
Tone, it was decided not to include Style and Tone as a separate 

category in the scoring system. The Vocabulary catef;c/ry was also 
eliminated, since aspects of this category could be subsumed 
under Expression and Accuracy. Finally, correctness in MechCinics 
(spelling and punctuation) was expressed in terms of numbers of 
errors for the Sentence Scoring Grid, and proportions of items 
correct for the Paragraph Scoring Grid. The pilot version of the 
Sentence Scoring Grid is located in Appendix G; the Paragraph 
Scoring Grid can be found in Appendix H. 
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4. Trialing and Pilot TMting 

This section describes the trialing and piloting of the 
SEVTE. The results of the piloting and subsequent revisions are 
also discussed. 
4.1. Trialing 

The trialing of the two forms of the SEVTE was carried out 
at CAL on February 20 and 21, 1989. Three CAL employees and one 
CAL spouse took the exams. The Spanish oral proficiency levels 
of these four people varied from level 2+ to level 5, the latter 
being a practici> attorney who is an educated native speaker 
from Argentina. 

Before taking each form, examinees received a questionnaire 
that asked them to provide a global rating of their English and 
Spanish proficiency (see Appendix J). After completing each 
section of the test, they commented on it and noted on the exam 
feedback questionnaire (see Appendix K) specific errors or 
problems they encountered. 

CAL examined the responses to each item as well as to the 
questionnaire in order to detemnine which items should be 
modified and which should be deleted, and the exam forms were 
revised accordingly. 

On March 29, 1989 two FBI translators each took either Form 
1 or Foirm 2 of the SEVTE. They provided written feedback to CAL 
which was taken into consideration in revising the exams after 
the pilot testing. 
4c2. Pilot Testing 
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This section describes the SEVTE pilot data collection, the 
results of pilot testing, and the revisions that were nade 
following data analysis, 
4 •2. 1. Data Collection 

The SEVTE exam forms were piloted at Georgetown University 
on April 1, 1989. Forty-five students from the Department of 
Translation and Interpretation completed the Multiple Choice 
sections of both forms together as a group. Each student was 
paid $?5.00 for taking both sections. Graduate students in the 
Translation Certificate program took the complete exam; four 
students took Fomn 1 and five took Form 2. Each of these 
students was paid $15 for taking one form of the entire SEVTE 
exas. 

The Georgetown University students kept track of how many 
minutes it took them to complete each section of the exam. They 
also completed a questionnaire regarding their native language 
background and their proficiency in English and Spanish. 
(Appendix M contains a copy of the questionnaire; a summary of 
the responses of examinees is also located in Appendix M. The 
data in this summary represents all examinees who participated in 
the pretesting, including those graduate students who took either 
the SEVTE or the ESVTE.) In addition, we asked students to 
comment on any items that were confusing or that caused them 
particular difficulty. 

Of the 48 students who participated in the pretesting, 
English was the native language of 41. 7 students indicated 
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another native language, but knew some Spanish. These other 
native languages were Portuguese, Tagalog, Korean, Chinese, 
Russian, and Italian. 
4.2.2. RMUltS 

Table 1 displays a sumnary of the perfomance of the pilot 
study examinees on the Multiple Choice sections of the SEVTE exam 
forms. Reliability estimates, calculated using Kuder-Richardson 
formula 20 (KR-20) , are also shown." 



Table 1 

SEVTE Multiple Choice Sections 
Total Pilot Sample 







He^n 


i 


Std. Dev. 


KR-20 


1 


47 


45.6 


72 


5.65 


.73 


2 


48 


48.0 


75 


6.01 


.76 


There 


were 63 


items on 


the pilot 


version of Fora 


1, and 



on Form 2. Using the mean percentage correct to compare the two 
forms, it is apparent that Form 1 was slightly more difficult 
than Form 2, although both forms appeared to be somewhat easy for 
this group of examinees.*' The reliability estimates were low, 
indicating that some of the items were not functioning as 
intended (i.e., they were either too easy or too difficult, or 



**13KR-20 yields an estimate of the internal consistency of 
the test items, i.e., a measure of the extent to which examinees 
perform consistently across the items within a test. It is very 
sirailar to parallel form reliability. 

four-option, multiple choice exam of optimal difficulty 
would exhibit a mean score of 62.5% correct. 
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failed to discriminate among high and lev proficiency examinees) • 

A record was kept of the time it took students to complete 
the Multiple Choice sections. The amount of time reqfuired ranged 
from 24 to 31 minutes. 

Since only a few examinees took the Production sections, 
descriptive statistics for this section were not calculated. The 
principal goals in piloting the Production sections were to 
evaluate the appropriateness of the scoring system, and to 
identify items that were either ambiguous, too easy, or too 
difficult. 
4 •2. 3. Revisions 

Students were divided by native language background 
(English, and other), and item analyses were conducted of their 
responses to the Multiple Choice section items. The item 
analyses showed that the items were easier for the native English 
speakers. (A majority of those who participated in the piloting 
were native English speakers.) 

Seven nonnative speakers of English, from backgrounds other 
than Spanish, also took the SEVTE. (Unfortunately, nc native 
Spanish speakers took this exam.) Since the item analyses showed 
that many items on both foirms of the Multiple Choice section were 
quite easy for nonnative as well as for native English speakers, 
it was necessary to write a number of new items and to revise 
many of the existing items to make them more difficult. The 
revision process involved deleting some items entirely and 
replacing others with new items that assessed a similar grammar 
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point or vocabulary item. Some of the distractors in a number of 
the remaining items were also joodified* In addition, items that 
did not discriminate well among high and low proficiency 
examinees in the total sample were eliminated. Finally, comments 
written by students after completing the exam were t^ken into 
consideration in identifying items for revision. We decided to 
include 35 Word or Phrase in Context items and 25 Error Detection 
items, for a total of 60 items, in the final form of the Multiple 
Choice section. This is slightly fewer than the 63 and 64 items 
included on the field test versions of the SEVTE. 

For the final version of Form 1, 30 (50%) new items were 
developed, and 2 3 (16%) distractors were modified; for Form 2, 27 
(45%) new items were developed, and 20 (14%) distractors were 
revised. In general, the new items were designed to be more 
difficult, while the distractors were rewritten so that they 
would be more attractive to examinees. 

Responses to the Production sections were scored by CAL 
staff and consultants in order to try out the scoring procedures 
and to gather information that co^ld be used in revising items. 
As with the Multiple Choice section, the Production section items 
- .re analyzed in light of student performance (and comments from 
FBI staff as noted above). It was decided to include 15 embedded 
phrase, 10 sentence, and 3 paragraph tran*=^lation items on the 
final versions of the exam forms. Seventeen (59%) o^ the phrase 
and sentence items were deleted from Form 1, and 3 new items were 
created; 18 (62%) were deleted from Form 2, and 4 new items were 
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created. None of the paragraph items were Bodified. 

^^a^ kq0vi^^« ym^m ^revlsed to reflect the chiinges 
described above and copies were aade in preparation for the 
validation study described in section 5 of this report. 
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5. Talidation study 

reliability and validity of the SEVTE as a aeasure translation 
ability. In this context, the validation study had Wa nvunber of 
specific aims. One aim was to field test the revised exam to see 
if its items and sections performed acceptably. Another aim was 
to administer the test to a more appropriate population than the 
pretest versions' population in order to set passing scores based 
on their performance." Another aim was to further assess the 
rating criteria that had been developed for scoring each part of 
the Production section. Another was to determine whether this 
section could be scored reliably. The validation study, as the 
word validation implies, also sought to gather information on the 
validity of the test. With the analysis of construct validity in 
mind, it was decided to collect scores on other measures from 
employee files and to assess the testes ability to predict 
overall translation ability by having raters make an overall 
assessment of ability using the FBI/CAL Translation SLDs. 
Another aim of the validation study was to gather evidence 
concerning criterion-related validity by having examinees rate 
their ability to translate various types of texts on the job, and 
then determine the relationship between scores on the test and 
the self-ratings. We chose to use self-ratings, rather than 
supervisor's ratings, because we were advised by the FBI that 



"The population that took the field test version consisted 
mostly of university students. 
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supervisors would not be in a position to evaluate translation 
abili^t^* ^notHsr •iis w*** to wXarsinees fsXt tJi^ test 

to be a valid test of their translation ability. An additional 
aim was to gain a further understanding of the constructs the 
test measured; at the time ve were not sure if ve were measuring 
a single construct, two or more constructs, or whether we were 
measuring a test method effect (recognition versus production).^* 
Another purpose of the validation study was to determine the most 
appropriate weighting of the parts and sections. A final purpose 
of the validation study was to gather the data necessary to 
equate the two parallel forms of th^ test. This section 
describes the validation study design, and data collection 
procedures. The results of the study are discussed in the 
following three sections. 
5.1. Overview 

The orignal design of the validation study called for 
administering the SEVTE to FBI Language Specialists and Contract 
Linguists at various field offices around the country. It was 



**This degree of uncertainty and the multiple aims of the 
validation study were due to the fact that so little was known 
about the measurement of translation ability at the time the 
project began. Thus, the validation study, and indeed the entire 
project, combined experimentation with a commitment to develop 
and validate a test. To draw an analogy to the business world, 
it is as if we were carrying out both the research and 
development function and the manufacturing function at the same 
time. Under normal circumstances the manufacturing function is 
carried out after the R+D function has ^een completed. While far 
from ideal, the reality of our situation was that we were working 
under a fixed-price contract to manufacture a test. The client 
was aware of the possibility of R+D problems, but it was assumed 
that these would be worked out along the way. 
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hoped that individuals of varying ability levels would be 
included in the Sample* In order to exanlne the Volluity of the 
SEVTE, scores on other neasures of language ability were obtained 
from employee files as available. 

Both forms of the SEVTE were given in one sitting (about 
four hours in duration) at each of seven FBI field offices. The 
order of administration of the forms was counterbalanced to 
control for the practice effect. Thus, approximately half of the 
examinees took Form 1 first and the other h^ilf took Form 2 first. 
S.l.l. Test Administration Instructions 

CAL developed a set of test administration instructions for 
the SEVTE. These include instructions to the test administrator 
regarding the following: 1) test security, 2) assembling test 
materials, 3) arranging for a testing site, 4) equipment, 5) 
administering the test (including timing of sections) , and 6) 
procedures to follow after the test. Appendix A contains a copy 
of the administration instructions for the SEVTE. 
5 . 1.2 • Questionnaires 

CAL developed two questionnaires for use in the validation 
study: 1) a sel f -assessment questionnaire on which an examinee 
was asked to estimate his or her ability to render a verbatim 
translation from Spanish into English, and 2) a questionnaire 
requesting examinee feedback on aspects of the format and content 
of the exam. (A copy of the self -assessment questionnaire is 
located in Appendix N, and a copy of the exam feedback 
questionnaire is in Appendix L.) 
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s.x.Zi. Subjects 

Testing aateirialB, including test adninistiration 
instznictions, numbered test booklets, answer sheets, pencils, 
questionnaires, and test administrator report forms^^ were sent 
to the FBI field offices in Los Angeles, San Diego, Albuquerque, 
Phoenix, and El Paso on November 15, 1989. Similar sets of 
materials were sent to Houston" and Puerto Rico on November 17, 
1989.^^ Materials from SEVTE administration were returr»ad to CAL 
within two to eight weeks 



^®CAL developed this foirm for test administrators to note 
any irregularities that may occur with respect to test security, 
the test administration, or the condition of the test materials. 
We requested that the validation study test administrators 
complete and sign the form even if there were no irregularities. 
(See Appendix A for an example of this form.) 



^Arrangments were made for members of the Houston Police 
Department (for whom Spanish OPI scores were available) to be 
tested along with the FBI employees at the Houston field office. 

"a cover letter was sent with the materials to the contact 
person at each field office. In addition to thanking them for 
their assistance in carrying out the validation study, the letter 
emphasized the importance of test security, outlined the 
procedures for the test administration, noted the proposed 
administration date, and instructed them to return all materials 
to CAL immediately after the test administration. A checklist of 
the materials was enclosed with each cover letter. CAL retained 
a copy of the checklists and used them to verify that all of the 
materials were returned as requested. 

^^Al though most field offices were able to follow the 
administration procedures as outlined, a few had difficulty 
scheduling all of the examinees to be present for the test 
administration, and consequently had to give more than one 
administration of the same exam. These difficulties accounted 
for their delay in returning some of the exam materials. 
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Since the FBI Language Specialists were already working in 
Spanish, there were no examinees with low level translation 
ability among them. Also, because of the dire need tor the 
services of the FBI^s current Language Specialists, it was 
difficult to recruit an adequate ntimber of Language Specialists 
and Contract Linguists for the validation study. Thus, in an 
effort to ensure a minimally adequate sample size, and to ensure 
that the entire range of abilities of potential test takers in 
the operational program (the testing program for applicants) 
would be represented in the sample, the FBI and CAL arranged for 
13 beginning Spanish language students at the CIA to take the 
SEVTE Multiple Choice sections during the first week of April, 
1990. Also, FBI Field Offices were allowed to assign Special 
Agents end bilingual support staff to take the test. In 
addition, CAL contracted three professional translators to take 
the full SEVTE forms. These exams were administered at CAL on 
January 9, 1990. 

Hence, a total of 58 examinees took the SEVTE in the 
validation study. Of this group, 15 (26%) were FBI Special 
Agents, 11 (19%) were FBI Language Specialists (or Contract 
Linguists, who do similar work), 10 (17%) were FBI support staff, 
6 (10%) were members of the Houston Police Department, 13 (22%) 
were CIA Spanish language students, and 3 (5%) were professional 
translators. It should be reiterated that while it was 
originally envisioned that the subjects of the validation study 
would be limited to Language Specialists, we were unable to 
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secure release time for an adequate sanple of Language 
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FBI Headquarters staff, it was decided to include other PBI 
personnel (Special Agents and support staff) in the validation 
sample, as well as the other groups that were represented. 
5.2 • Scoring 

The Multiple Choice parts of the SEVTE forms were scored by 
machine, using answer keys based on the revised versions of the 
forms* 

The Production parts were scored by the same raters (Matilde 
Farren and Mary Lee Scott) who scored the pilot study data, using 
the scoring keys and analytic sentence and paragraph guidelines 
which had been prepared ♦ Word and Phrase Translation items were 
scored using a key of acceptable responses, which has been 
provid^id to the FBI* Sentence Transrlation item*; were scored 
using the Sentence Accuracy Scoring Guidelines (See Appendix E) • 
These focused on the the presence of mistranslations, omissions, 
and inappropriate additions in the content of the translation, as 
well as on the conveyance of all appropriate nuances* 

In order to determine which scoring system was most 
efficient and yielded the highest interrater reliability, the 
Paragraph Translations were scored in two ways, a) using the 
analytic paragraph guidelines, and b) using the FBI/CAL 
translation skill level descriptions. The SEVTE Paragraph 
Scoring Guidelines (see Appendix F) require the rater to assign 
each paragraph from 0-5 points on each of four criteria: 
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graxnnari expression^ nechanicSi and accuracy. The totals for the 

first three criteria, grammar , expression, and Bechanlcsi are 
summed to produce the Expression score for the Production 
section. The ratings from Accuracy are summed and contribute to 
the total Accuracy score, which is earned exclusively on the 
Production section of the SEVTE. The scoring guidelines for 
grammar require the rater to distinguish between errors in simple 
and complex structures, between low frequency and high frequency 
structures, and to consider the number of errors of each type in 
each paragraph. The scoring guidelines for expression require 
the rater to evaluate the paragraph for word order, vocabulary, 
idomaticity style i\nd tone. After consideration of these, the 
rater makes a judgement as to the degree to which the translation 
follows the conventions of the source language or the target 
languages. The scoring guidelines for mechanics require the 
rater to evaluate each paragraph for the frequency of errors in 
spelling, punctuation, and capitalization. The scoring 
guidelines for Accuracy are identical to the scoring guidelines 
for Sentence Translation items. Additional information on the 
scoring procedures can be found in sections 2.1.3 and 2.2.3 of 
this report. 

After the scoring of the Production section was complete, 
each rater assigned an overall ability level for Expression and 
Accuracy using the FBI/CAL SLDs, based on evaluation of the 
sentence and paragraph translations. This overall ability level 
was used in order to construct the FBI/CAL Translation Scale 
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conversion tables. 

translation ability level could be assigned to each examinee. 
The decision to score Expression and Accuracy separately vas made 
by CAL after the data were collected as a result of experience 
gained during the pilot study and after the scoring of an initial 
group of SEVTE papers from the validation study. This decision 
was made to aid in evaluating different types of examinee 
performance. Some translations were very fluent and grammatical 
but inaccurate (as may occur when an examinee's proficiency is 
higher in the target language) , while others were nostly accurate 
but evidenced problems with grammar or vocabulary (as may occur 
when an examinee's proficiency is higher in the source language) . 

In order to be able to assign separate FBI/CAL Expression 
and Accuracy scores, the original FBI/CAL translation SLDs were 
reorganized so that the descriptions for Expression at each level 
were contained in one section and the descriptions for Accuracy 
in another. A copy of the reorganized SLDs can be found in 
Appendix I^ 
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€. Reliability 

Study test administration are presented in this section by 
sxibtest. An effort was Bade to examine reliability in a nuxober 
of ways and from a number of perspectives* It should be 

remembered that this data on reliability is a function of the 
sample tested and the raters usad. 

6.I. Multiple Choice Section: Descriptive Statistics a&d 

Reliability 

Table 2 presents the results of the validation study 
administration of the Multiple Choice section of the SEVTE forms. 
This section is referred to here as KCl and MC2. 

Table 2 

Descriptive Statistics for SEVTE MCI and MC2 

Form U Mean Std. Dev. Minimum Maximum 

MCI 58 37.5 9.60 9 57 

MC2 58 34.9 10.78 9 55 



As can be seen in Table 2, the mean score on MC2 was 2.6 
points lower than on MCI. Thus, MC2 appears to be somewhat isore 
difficult than MCI. However, given the magnitude of the standard 
deviation on both tests, the difference between the two means is 
not significant. The larger standard deviation for MC2 suggests 
that less competent examinees may have tended to score slightly 
lower and more competent examinees slightly higher on MC2 than 
they did on MCl. 
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As there were a total of 60 items in the Multiple Choice 
section, the mean of MCI represents 62.5% correct, while the mean 
of MC2 represents approximately 58.2% correct* Thus, MCI appears 
to be of optimal difficulty, while MC2 is slightly »ore difficult 
than would be ideal for this sample.** Indeed, the lowest score 
on both forms (9) was quite a bit lower than what would be 
expected by chance alone (15). This apparently occurred because 
a few of the lower ability examinees were not able to complete 
the Multiple Choice section in the time allotted. 

Table 3 presents the KR-20 reliability estimates for the two 
forms of the Multiple Choice section based on the validation 
study sample. KR-20 is a measure of internal consistency 
reliability, which is the degree to which the items (considered 
as a set) on a test measure the same ability. 



The reliability of the Multiple Choice section of both SEVTE 
forms is high and indicates that either form can be used with 
confidence on a population similar to that of the validation 



^*We would expect a mean around 62.5% on a four-option, 
multiple choice test of optimal difficulty for the population, 
when the sample fully and equally represents the total range of 
abilities in the population. 



Table 3 

KR-20 Reliability for MCI and MC2 



MCI 
MC2 



• 89 
.91 
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study. 

A second indication of the reliability of the section is the 
consistency of performance of the group of 58 subjects on the two 
forms. Referred to as the coefficient of ecjuivalence or parallel 
form reliability, this type of reliability is obtained by 
calculating the Pearson Proiuct Moment correlation between 
subiects' performance on the two different forms. For the 
multiple choice section on the two SEVTE forms, the coefficient 
of equivalence is .81. This is within acceptable limits. 
Together, both the KR-20 reliability estimates and the 
coefficient of equivalence are adequately high, indicating that 
the two main sources of measurement error (inconsistency across 
items and inconsistency across forms) are minimal for the 
Multiple Choice section of the SEVTE. 
6.2. Production section: Descriptive Statistics and 

Reliability of the Accuracy Score 

Table 4, which follows, shows the descriptive statistics for 
the SEVTE-^Accuracy Subsections and Totals by form and by rater. 
Close examination of the means in Table 4 shows that the 
difficulty of the two forms is very similar. Averaging the 
scores assigned by both raters, we see that the Word and Phrase 
Translations seem to be slightly harder on Form 2 (7.0 versus 
7.85 on Form 1), while the Sentence Translations seem to be 
slightly harder on Form 1 (28.55 versus 30.0 on Form 2). The 
Paragraphs seem to be equally difficult on both forms (6.65 on 
Form 1 and 6.55 on Form 2). The two raters appear to be 
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consistent in their degree of severity, with Rater 1 always being 
sore generous than Rater 2 , except in the case of the Sentences 
on Form 2, where they are eq[ually severe. 



Table 4 

Descriptive stati«»tics for SEVTE Accuracy 
Forms 1 (N-45) and Form 2 (N>44) 



Measure 



Word + Phrase 
Rl Fl 
R2 Fl 
Rl F2 
R2 F2 

Sentences 
Rl Fl 
R2 Fl 
Rl F2 
R2 F2 

Paragraphs 
Rl Fl 
R2 Fl 
Rl F2 
R2 F2 

Total 
Rl Fl 
R2 Fl 
Rl F2 
R2 F2 



9.2 
6.5 
8.0 
6.0 



29.2 
27.9 
30.0 
30.0 



7.0 
6.3 
7.6 
5.5 



46.46 
41.72 
46.97 
42.72 



Std. Dev. 



3.0 
2.7 
3.0 
2.9 



11.1 
9.3 
9.4 
7.6 



3.5 
2.7 
3.0 
2.6 



15.88 
12.73 
12.74 
10.49 



Minimum Maximum 



3 
2 
2 
2 



4 

9 
6 
9 



0 
1 
0 
0 



14.5 
16 
26 
23 



15 
13 
14 
12 



46 
47 
47 
46 



14 
12 
14 
11 



73 
67 
74 
68 



Legend: R=rater, F=form. Ihus Rl Fl is the score assigned by 
rater 1 on form 1. 



In discussing the reliability of the SEVTE Accuracy scores, 
there are two sources of measurement error that need to be 
examined: inconsistencies across raters and inconsistencies 
across forms. Traditionally these have been examined separately, 
though today general izability theory allows us to look at both 
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together* In this discussion we will first examine thet^e two 
sources of error separately by examining interrater reliability 
and parallel form reliability. We will conclude vith an 
examination of the results of a generalizability study on the 
data. 

Table 5 shows the interrater reliability (Pearson Product 
Moment Correlations) of the SEVTE Subsections and the total 
Production section score for Accuracy. The reliability for Form 
1 is listed first, followed by the reliability for Form 2. 



Table 5 
Interrater Reliability of 
SEVTE Production Subsections and Production Total 
for Accuracy (Forms 1+2) 

Form 1 Form 2 
Word and Phrase .86 .85 
Sentences .89 .90 
Paragraph (Accuracy) .74 .78 



Total Accuracy .93 .93 



As can be seen, the interrater reliability estimates of the 
Accuracy scores on all subsections are quite high, with highest 
correlation for Sentence Translation. Across the two forms, the 
correlations for each Accuracy subsection are also highly 
similar. The interrater reliability estimates for the total 
Accuracy score (.93) are high and consistent across forms. 

Table 6 presents the coefficient of equivalence of the 
Accuracy scores across forms and raters. This data is an 
indication of the parallel form reliability of the SEVTE across 
different raters. 
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Table 6 

Coefficient of Equivalence for SEVTE Accuracy Scores 

(N«43) 



Form 1 Rater 1 
Fora 1 Rater 2 



Form 2 Rater 1 
.85 

.89 



Form 2 Rater 2 
.89 
• 89 



As can be seen, t.he coefficient of equivalence of the SEVTE 
Accuracy score is quite high for a free response test scored by a 
single rater. That is^ there is a high degree of agreement 
across forms and raters. This suggests that SEVTE Accuracy 
scores can be highly stable. Even under the most severe 
circumstances, an examinee taking different forms which are in 
turn scored once by a different rater, the scores show a 
remarkable degree of agreement. Thus, it appears that the 
reliability of the SEVTE Accuracy score is high." 

In order to mure efficiently examine the effects of rater 
severity on the reliability of the SEVTE -Ac curacy score, a 
generalizability study (G-study) was undertaken on the total 
SEVTE-Accuracy Score. A G-study is a means of looking at 
multiple sources of variance simultaneously. In thi«' study, the 
two sources of variance investigated were forms and raters. The 

"Again, it should be remembered that the consistency of the 
SEVTE Accuracy score is dependent on well trained raters. In an 
operational program, it should be possible to exceed the 
reliability attained in this experimental study. Operational 
raters will have the benefit of being able to train using the 
rater training materials that were developed as part of this 
projp^t. In this study, the raters approached the task of rating 
without the benefit of having undergone « rater training program. 
Ratings were done on an intermittent basis at home. 
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results are presented in Table 7. 



Table 7 

Variance Contributions of Raters and Forms 

to the SEVTE-Accuracy Total Score 

Source of Variance Component Standard 

Variance Estimate Error 



Persons 138.665 31.95 

Forms -.285* . 10 

Raters 10.120 8.37 

Persons x Forms 11.971 3.94 

Persons x Raters 4.110 2.39 

Forms x Raters -.180* .09 

Residual 11.225 2.39 

*A negative variance estimate is an artifact of the estimation 
procedure. Generally these can be regarded as equivalent to zero 
(Brennan, 1983, p. 103). 

Table 7 shows that the variance due to the raters, forms, or 
any two-way interactions is relatively small in comparison to the 
variance measured among the persons. Indeed, the second highest 
variance component (11.971) is only 8.6% as large as the largest 
component and represents only 6.81 of the total variance of 
176.091. Moreover, the variance due to forms and to form by 
rater interaction is negligible. This argues that differences in 
scores due to forms are minor. 

The variance components estimated in a G-study can be used 
in a decision study (or D-study) to estimate the reliability 
(generalizability coefficient) of a test under various conditions 
of the facets being studied. Table 8 presents the estimated 
generalizability coefficients given both raters and foras as 
sources of errors under various groupings of two forms and two 
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raters. 



Table 8 

Estimated Generallzability Coefficients for the 
SEVTE-Accuracy Score using Different 
Groupings of Forms and Raters 



Number of 
Forms 



Number of 
Raters 



General izability 
Coefficient 



1 
1 
2 
2 



1 
2 
1 
2 



• 84 

• 88 

• 90 

• 93 



The results in Table 8 show that the reliability for the 
SEVTE-Accuracy score, when one form and one rater is used, is 
.84, given measurement errors due to both raters and forms. This 
is very high for a rater-scored test. It may be noted that the 
reliability using two forms and two raters (as was the case in 
the validation study for the development of the SEVTE) was a very 
high .93. 

$.2. Production Section: Descriptive Statistics and Reliability 
of the Escpression Score 

Table 9 below shows the SEVTE-Expression descriptive 
statistics (raw scores) for the Production section of the test by 
form and by racer. In the Production section, only the Paragraph 
Translations are rated for Expression. They are rated for the 
three criteria that figure into the total score for Expression. 
These criteria are Grammar (morphology) , Expression (syntax and 
vocabulary), and Mechanics (spelling and punctuation). 
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Table 9 

Descriptive Statistics for SEVTE Expression: Paragraphs 

Subsection Form 1 (N-45) and Term 2 (N-44) 

Hsasurs MfiAii Std. D^v, uinlBUB HaKisus 
Grannar 

Rl Fl 9.5 4.2 0 15 

R2 Fl 9.0 3.4 2 15 

Rl F2 11.0 3.4 0 15 

R2 F2 9.3 3.6 0 15 



Expression 

Rl Fl 6.3 3.3 1 15 

R2 Fl 7.5 2.8 1 13 

Rl F2 8.3 2.9 0 15 

R2 F2 7.1 2.9 0 13 

Mechanics 

Rl Fl 10.4 3.8 3 15 

R2 Fl 10.7 3.8 2 15 

Rl F2 11.9 3.4 0 15 

R2 F2 10.0 3.6 0 15 



Total (for Expression production section) 

Rl Fl 30.3 8.7 4 45 

R2 Fl 31.4 6.6 11 42 

Rl F2 34.3 4.5 25.5 45 

R2 F2 29.0 6.6 16.5 42 



Legend: R=rater, F=form. Thus Rl Fl is the score assigr.od by 
rater 1 on form 1. 

Close examination of Table 5 shows that the difficulty of 
the two forms is very similar. Averaging the scores assigned by 
both raters, we see that the Paragraph Translation Expression 
scores seem to be slightly lower on Form 1 for all three scoring 
criteria. For Form 1 grammar the mean is 9.25 versus 10.15 for 
Form 2. For Form 1 expression it is 6.9 versus 7.7 for Form 2. 
For Form 1 mechanics it is 10.55 versus 10.95 for Form 2. For 
the total from this section, the mean on Form 1 is 30.85; for 
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Form 2 it is 31.65. The total means differ by less than 1 point 
indicating that the Production sections of the two forms are 
nearly equal in difficulty as a measure of the construct of 
Expression. 

As in the discussion of the reliability of the Accuracy 
scores, ve will first look at interrater reliability and parallel 
form reliability separately. Table 10 shows the interrater 
reliability estimates (Pearson Product Moment Correlations) of 
the SEVTE Production subsections and the total Production section 
score for Expression. These scores are all based on the 
Paragraph Translation subsection of the Production section of the 
test. The reliability for Fomn 1 is listed first, followed by 
the reliability for Form 2. 

Table 10 
Interrater Reliability of 
SEVTE Production Subscores and Production Total (Forms 1+2) 

Form 1 Form 2 

Paragraphs-Grammar .53 .67 

Paragraphs-Expression .81 .83 

Paragraphs-Mechanics .66 .87 

Total Expression* .83 .86 



♦Total for Expression is for the total of the three Expression 
subscores on Paragraphs only. 

The interrater reliabilities for the three Expression 

criteria are not as high as they were for the Accuracy scores, 

and the interrater reliability was lower for Form 1 than for Form 



71 



ERiC 



73 



2." Stilly the interrater reliability for the total Expression 
score earned on the Production section is quite respectable. 

Table 11 presents the coefficient of eq[uivalence of the 
total Expression scores on the Production section across forms 
and raters. This data is an indication of the parallel form 
reliability of the SEVTE across different raters. 



Table 11 

Coefficient of Equivalence for SEVTE Expression Scores 
(Production Section only, N«43) 



Form 2 Rater 1 Form 2 Rater 2 

Foirm 1 Rater 1 .61 .67 

Form 1 Rater 2 .69 .79 



This data, unlike that for the Accuracy scores, indicates 
that raters were less consistent in their awarding Expression 
scores across the different forms. 

In order to examine the combined effects of rater and form 
interaction on the reliability of the SEVTE-Expression Production 
Subsection, a generalizability study (G-study) was undertaken on 



"it should be noted that interrater reliability is a rater 
characteristic, not a test characteristic. Nevertheless, a test 
developer must present information on interrater reliability. In 
the future, the interrater reliability of the SEVTE will depend 
on the reliability of the individuals who score the SEVTE. 
Raters in the SEVTE operational program, however, will have the 
advantage of having available training materials that were 
generated as a by-product of this study. Thus, these SEVTE 
operational raters should exceed the reliability of raters in 
this developmental study, in this study, the raters approached 
the task without the benefit of having undergone a rater training 
program. Thus, the raters may have used different scoring 
standards at different points during the three months that they 
were rating the production section. Ratings were done on an 
intermittent basis at home. 
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the total SEVTE-Expression Production Score. As in the previous 
study, the two sources of variance investigated were forms and 
raters. The results are presented in Table 12 • 



Table 12 

Variance Contributions of Raters and Forms 
to the SEVTE-Expression Production Total Score 



Source of 
Variance 


Variance Component 
Estimate 


Standard 
Error 


Persons 


29.170 


7.52 


Forms 


-5.379* 


4.41 


Raters 


-3.321* 


4.72 


Persons x Forms 


6.737 


2.69 


Persons x Raters 


-.670* 


1.38 


Forms X Raters 


10.563 


8.81 


Residual 


9.767 


2.08 



*The negative variance estimate is an artifact of the estimation 
procedure. Generally these can be regarded as equivalent to zero 
(Brennan, 1983, p. 103). 



Table 12 shows that the variance due to the raters, forms, 
and person by rater interaction is relatively small in comparison 
to the variance measured among the persons. However, there are 
some large variances due to interactions. The forms by rater 
interaction, the second highest variance component (10.563), is 
36% as large as the largest component and represents 19% of the 
total variance of 56.237. This indicates that raters were not 
consistent in the way they awarded points across the two forms, 
as the data in Table 11 also suggests. This can be illustrated 
by comparing the total Expression Production means in Table 9. 
On Form 1, Rater 2 is more lenient (31.4 versus 30.3 for Rater 
1) . On Form 2, however, Rater 1 is more lenient (34.3 versus 
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29.0 for Rater 2). In addition, the variance component due to 
person by form interaction is also noteworthy. This indicates 
that to some extent examinees were not performing consistently 
across the two forms. Finally, the residual amount of variance, 
which includes the three-way interaction of persons by forms by 
raters and any random variance, is also relatively large. These 
results indicate that further training of raters on rating the 
paragraphs for Expression scores will be necessary in the 
operational program of the SEVTE and that the reliability for 
Expression score may be low. 

Table 13 presents the estimated generalizability 
coefficients from a D-study produced by the variance components 
estimated above given both raters and forms as sources of errors 
under various groupings of two forms and two raters. 



Table 13 

Estimated Generalizability Coefficients for the 
SEVTE-Expression Production Score using Different 
Groupings of Forms and Raters 



Number of Number of Generalizability 

Forms Raters Coefficient 



1 1 ,64 

1 2 .71 

2 1 .78 
2 2 .83 



The results in Table 13 show that the reliability for the 
total SEVTE-Expression score on the Production section, when one 
form and two raters are used, is .71, given errors due to both 



ERLC 



74 



78 



forms and raters. Although this is only moderate, two things 
should be noted* First, this score saKes up only part oi the 
SEVTE total Expression score since the multiple choice »ection is 
also included in it. Second, the reliability using two forms and 
two raters (as was the case in the validation study for the 
development of the SEVTE) was an acceptable .83. 

The final total SEVTE Expression sc'^^'e is a composite of an 
examinee's score on the Multiple Choice section of the test and 
the Production section total, discussed above. Most of the 
points that can be earned by an examinee in the SEVTE Expression 
score are earned in the Multiple Choice section; i.e., the 
Expression score is the sum of the three subscores in the 
Production section (uaximum of 45 points) and the MC section raw 
score (maximum of 60 points), as explained in section 1.3 of this 
report. Because the total Expression score is a composite of the 
Multiple Choice section score and the Production score, it is not 
possible to calculate a single empirical estimate of the 
reliability of this composite score in the same convenient way 
that one might do for a multiple choice test. There are, 
however, a number of ways of looking at the reliability of this 
composite score. 

First, in order to examine the effects of different raters 
on the consistency of the composite SEVTE Expression score, we 
can calculate the degree of agreement in composite Expression 
scores when different raters score the Production section. The 
correlation between the composite Expression scores, when the 
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points awarded by each rater are added to scores obtained on the 

(with scores for Form 2 weighted as described in section 5.2). 
These correlations are quite high, suggesting that the composite 
Expression scorri is quite stable across raters. 

A second way is to look at the consistency of scores earned 
on the two different forms. This comparison produces an index 
known as the coefficient of equivalence or parallel form 
reliability. This coefficient of equivalence is represented in 
Table 14 below. 

Table 14 

Coefficient of Equivalence for SEVTE Expression Composite Scores 

(NM3) 

Form 2 Rater ; Form 2 Rater 2 

Form 1 Rater 1 .73 .78 

Form 1 Rater 2 »82 ♦BS 



This table depicts the four indexes of equivalence that can 
be calculated when each of two test forms is scored by two 
raters. As can be seen, the average coefficient of equivalence 
is about .81. 

A final way to examine the reliability of the composite 
Expression score is use coefficient alpha to examine the 
reliability of the composite score formed by adding together the 
two part scores (MC and Production). In other words, under this 
procedure the two part scores are viewed as two subtests. It is 
appropriate to do this when the subtests of a composite are 
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parallel. When subtest of a composite are parallel, then 
coefficient alpha can be referred to as the coefficient of 
precision (Crocker and Algina, 1986, p. 121), which is an 
estimate of test-retest reliability. An example of parallel 
subtests would be an essay test score that is a composite score 
based on two ratings. When the subtests or part scores are not 
parallel, coefficient alpha must be thought of as a lover bound 
estimate of this coefficient of precision. 

In applying coefficient alpha to the SEVTE Expression 
scores, it is appropriate to average the production section 
scores awarded by the two raters used in this study. This mean 
score on the production section gives us the best estimate of the 
scores that would be awarded by any other rater who may score 
this test. Calculated in this manner, coefficient alpha is .76 
for Form 1 and .53 for Fomn 2, with unweighted scores being used 
for Form 2. Since the MC section and the Pro^'uction section are 
so different, they cannot be considered parallel subtests. Thus, 
it is not surprising to find lower bound estimates of the 
coefficient of precision for the SEVTE in this moderate range. 
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7. Ezaalning th* Validity of th« sevte 

Testing (American Educational Research Association^ •t al., 
1985) , test validity refers to "the appropriateness, 
neaningfulness and usefulness of the specific inferences nade 
from test scores" (p. 9) . Validity is demonstrated by an 
accumulation of evidence that supports the claim of validity for 
a particular test. Some of this evidence is empirical. Other 
evidence may be qualitative , in that it deals with the content of 
the test, or it may be theoretical, in that it deals with a 
theory about the nature of the trait being measured by the test. 
In the case of the SEVTE, the central validity concern is the 
claim that the test is a measure of the ability to translate a 
written text in Spanish into correct and appropriate English. 

Traditionally, three types of validity are usually 
identified according to how the evidence was gathered. These are 
content validity, criterion-related validity, and construct 
validity. Construct validity, which •'focuses primarily -^n the 
test score as a measure of the psychological characterip/cic of 
interest" (AERA, et al., 1985, p. 9), may be understood to 
subsume the other two types; i.e., content and criterion-related 
validity are also evidences of the construct validity of a test. 
Thus, construct validity is of central interest. We will work 
toward a discussion of the constxruct validity of the SEVTE, by 
beginning with an analysis of its content validity. 
Subesequently , we will examine the construct validity of the ^est 
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more directly, through analyses of the trait that is being 
measured by the test* Finally, We vill examine the criterion- 
related validity of the SEVTE by considering its relationship to 
success at translating and to other measures of language 
proficiency. 
7«1« Content Validity 

Content validity is evidence that demonstrates the degree to 
which the sample of items, tasks or (questions on a test are 
representative of the domain of content that could be tested. In 
the case of the SEVTE, evidence for its content validity is found 
in the tasks examinees are asked to perform to demonstrate their 
ability to translate from Spanish to English. 

First, the Multiple Choice section involves two general 
tasks required of Spanish/English translators: recognizing 
whether a proposition in Spanish is rendered into English with 
appropriate expression, and recognizing errors in written 
English. Clearly, the ability to select the appropriate word or 
phrase from among the many that could be available or correct in 
other contexts is a skill that a translator must have. A 
translator uses this ability to recognize infelicities in his or 
her work in order to revise it successfully, in addition, the 
ability to recognize errors in English is important because the 
translator must be able to revise h^*^ or her first draft so that 
it represents appropriate English expression. Otherwise, the 
translator's English rendition can be accurate in terms of the 
rendition of the content of the source document, but it will 
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still appear to be a translation. 

ft94rfV4>«-* Vt0OVO VA«VOt9 VWW J^. A A V A V» %»4«AwUm«« 9w I^ma^^a^^w 

Choice items: 35 Words or Phrases in Context (VTPC) items and 25 
Error Detection (ED) items. WPC items test a wide variety of 
points of English and Spanish grammar* These points include 
subject-verb agreement, verb tenses, pronouns, prepositions, 
gender, and word order. They also test a range of Spanish-* 
English vocabulary, including nouns, verbs, adverbial and 
adjectival phrases, and false cognates. Each item on each of the 
two forms of the test focuses on the same or nearly the same 
aspect of grammar or vocabulary. The 25 ED items include errors 
of grammar, word order, vocabulary, punctuation or spelling in 
English only. Thus, of the seven criteria included in the 
Translation skill level descriptions (accuracy, grammar, 
vocabulary, style, tone, spelling, and punctuation) developed for 
this project, these Multiple Choice items test all except style 
and tone.^' (For additional information relevant to the content 
validity of the Multiple Choice section, see the content analysis 
in Appendix D. ) 

Second, apart from the ability to identify correct and 
incorrect expression, the ability to produce a correct 
translation is clearly required of a translator. The ability to 



''one way that vocabulary is tested is through the 
mistranslation of words. Mistranslation involve?^ both the 
vocabulary and accuracy aspects of the SLDs. Thus, the construct 
of Accuracy is partly represented in the content of the multiple- 
choice section. 
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produce a correct translation is assessed through 28 direct 
production tasks. 15 of these tasks involve the translation of a 

Translation (WPT) ; 7.0 invol ve the English translation of c ^nplete 
Spanish sentences (called Sentence Translation or ST) that range 
in length from 12 to 25 words; and 3 tasks require Paragraph 
Translation (PT) , the ability to produce an English translation 
of a paragraph written in Spanish. The three paragraphs range in 
length from approximately 80 to 120 words. 

The 15 Word and Phrase Translation (WPT) items and the 10 
Sentence Translation (ST) items present examinees with a variety 
of problems in vocabulary, idioms, grammar (norphology) and 
syntax. We judged the sentences to range in difficulty from 2+ 
to 4+ on the Translation Skill Level Descriptions, based on the 
frequency and complexity of language they employ and the 
difficulty that language presents to the translator.^* The items 
in each section are grouped by order of the perceived difficulty 
of the sentence on the FBI\CAL SLDs. Corresponding items on each 
of the two forms are parallel in content and perceived 
difficulty. 

For WPT items, item developers relied on their expertise as 
translators and as language teachers in order to develop 

^*As indicated by Stansfield and Liskin-Gasparro in Duran et 
al. (1985), it is heretical to classify decontextualized 
language, such as words, phrases, or sentences on the ILR scale. 
Still, for research or training purposes it is sometimes 
necessary to do this. An appropriate disclaimer of these 
difficulty levels is noted here, 
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appropriate items. They created items that test aspects of the 
language that present special difficulty when translated to the 
target language, often cases where there is no direct equivalent. 
For exaaple, the expression ••priced in the teens, •• has no direct 
equivalent, and use of the dictionary would not be helpful. In 
this case, the translator must use his knowledge of both 
languages to construct an appropriate translation. 

The ST items were constructed to include grammar problems 
that have traditionally created difficulties for translators and 
language students because of a lack of congruen' ^ between the two 
languages. Such problems include pronouns, verb tenses and 
'sequences of verb tenses, use of negatives, possessives, 
prepositions, and nontemporal verb forms, such as infinitive, 
gerund, and past participle. 

The first Paragraph Translation (FT) text is a newspaper 
account, using mature vocabulary and syntax, of a crime that 
occurred in a Spanish-speeking country. Th6 subject of the crime 
is airplane hijacking or sabotage, depending on the form of the 
test. This '-.ext was judged to be a low level 3 text based on the 
ILR skill level descriptions for reading. 

The second FT text is political/philosophical in nature. It 
deals with either the Armed Forces or ecology. The scoring 
guidelines (see Appendix F) are based on the Translation Skill 
Level Descriptions developed for this project. The difficulty 
level of this text was judged to be at 3+. 

The third FT text is a law or a legal interpretation of a 
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law. The scoring guidelines (Appendix F) are based on the 
Translation Skill Level Descriptions developed for this project. 
The guidelines for scoring all the paragraphs include nearly all 
of the criteria included in the Translation Skill Level 
Descriptions. The difficulty of this document is considered to 
be at the 4+ or 5 level on the ILR skill level descriptions for 
reading. Thus, the third text is clearly the aost difficult. 

The entire Production section is scored using scoring 
guidelines that are based on the level descriptions in the 
FBI/CAL Translation Skill Level Descriptions (see section 4.2 and 
Appendix I) . These descriptions were developed over a period of 
six months and represent a consensus of the experience of 
experienced translators and translation test evaluators. 

The text material that appears on the SEVTE was influenced 
by the results of the survey of FBI translation needs (see 
Appendix Q and section 1.3 of this report). This questionnaire 
was responded to by 28 Language Specialists and agents. The 
results indicated that the written materials the respondents most 
often deal with involve politics, narcotics, terrorism, foreign 
counterintelligence, written laws, theft, and organized crime. 
Many of the SEVTE texts were actually provided by the FBI, and 
those found by CAL staff were judged relevant by FBI Language 
Specialists. Texts found by CAL staff were taken from two 
sources: public documents such as newspapers and magazines, and 
documents that item writers actually have translated in their own 
translation work. Th3 texts taken from public documents were 
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guided by sample texts provided b> the FBI, especially in terms 
of vocabulary. These texts, as veil as the texts that item 
writers had previously translated on the job, were edited 
slightly to make them more suitable for these tests. The third 
paragraph, which is a legal document written in appropriate 
jargon, (sometimes referred to as ••legalese" among government 
Language Specialists) was supplied by the FBI for both forms * 
the ESVTE. In order the make the SEVTE as parallel as possible 
to the ESVTE, CAL staff located similar legal documents in 
Spanish for the SEVTE. 

It is interesting to examine the responses of the validation 
study subjects (Special Agents, Contract Linguists, Language 
Specialists and others) to the exam feedback questionnaire they 
completed after taking the test (see Appendix L) . On this 
questionnaire, 50% either agreed or strongly agreed with the 
statement, "The material in the exams was representative of the 
types of written documents I might encounter in my work." 
Another 50% either disagreed or disagreed strongly with the 
statement* It is difficult to interpret this data in terms of 
job relevance. Judgments of the job relevance of a test are 
highly dependent on the relationship between the test and the job 
of the individual subject, and the subjects in the sample varied 
greatly in the agency they worked for and in the job they 
performed. It must be remembered that within the sample of 58 
examinees, 22% were beginning and intermediate level CIA Spanish 
languege students who would not have ever translated such 
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material, 26% were FBI Special Agents, 19% were FBI Language 
Specialists (or Contract Linguists who do similar work), 17% were 
FBI support staff, and 10% were members of the Houston Police 
Department. The SEVTE was designed with the knowledge that it 
would be taken principally by potential and current Language 
Specialists and others who might wish to demonstrate the ability 
to do the type of translation that Language Specialists regularly 
do. Yet Language Specialists made up only 19% of the validation 
study sample. Under the circumstances, the responses to the job 
relevance question on the exam feedback questionnaire are not as 
negative as might have been e>q>ected. 

One of the subjects wrote on the questionnaire: "The 
vocabulary and material given in this test do not represent the 
material we are required to work with in the field. Thi. is 
geared mainly to the FCI LS's (foreign counterintelligence work 
and Language Specialists) — not those of us working in the 
criminal/drug cases." This telltale comment, apparently written 
by a Special Agent, represents the perception that the test 
reflects written material that FBI Special Agents are not 
normally asked to translate. Most written translation is done by 
Language Specialists. Thus, although critical of the test, the 
above comment reflects the perception that the test is relevant 
to the work of an FBI Language Specialist. 

At the same time, it is noteworthy that there was a more 
general agreement that the test measured translation ability. 
59% percent of the subjects either agreed or strongly agreed with 
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the statement •'There was sufficient opportunity for me to 
demonstrate my ability to translate from Spanish to English*" It 
may be that the 41% who disagreed with this statement did so 
because they felt unduly restricted by the time constraints of 
the testing situation; over half (53%) of the subjects felt the 
length of time given for the production section was "too short/* 
and none felt it was "too long." 47% felt it was "about right." 
(It may be noted that on the Multiple Choice section, examinees 
were markedly more positive about the length of time given, with 
92% indicating it was "about right," and 8% responding that it 
was "too short.") 

In interpreting the responses to the examinee questionnaire, 
it is important to note that approximately 15% of those who took 
the SEVTE in the validation study had received scores of 2+ or 
less on the Spanish DPI (see section 4.4.3 above). These 
subjects may have understandably felt pressured by the exam time 
constraints, since nearly all of the tasks on the test were above 
their level of ability. On the other hand, those subjects whose 
proficiency was very high may not have had sufficient time to 
revise their translations. Indeed, several of the examinees 
indicated this to test administrators, who in turn reported it to 
CAL on the test administrator report form. Because of this, CAL 
has recommended that the amount of time allowed for completing 
the Paragraph Translation subsection be increased from 37 to 48 
minutes; i.e,, 11 minutes more than examinees in the validation 
study sample were permitted « This may have the effect of raising 
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scores on the test somewhat." 

In general, the implications for test validity of the 
responses to the examinee questionnaire are lessened by the fact 
that a) most examinees in the validation sample were not Language 
Specialists, b) because of this, many had low ability in written 
translation, and c) the test was too speeded. This last problem 
has been corrected on the current form of the test by increasing 
the time limit for the Paragraph Translations from 37 to 48 
minutes. 

7«2. Construct Validity 

Traditionally, validity has been defined as the degree that 
a test measures what it claims to measure. Evidence of validity 
has been divided into three types: content validity, construct 
validity, and criterion-related validity. However, during the 
past 15 years, validity has come to refer to the inferences that 
can legitimately be made from test scores for a particular type 
of examinee and for a particular purpose. Similarly, construct 
validity has become synonymous with validity itself (Messick, 
1980). Because of this, the same definition is also the 
contemporary definition of construct validity. However, within 
the context of the validity section of this report, we have made 
use of the traditional division of kinds of validity in order to 

"The general increase in the test scores that may be 
obtained by increasing the time available to examinees to 
complete the test should be viewed positively. It is likely that 
if scores do increase under extended time limits, this will be 
due to a reduction in test speededness, and the scores will be 
more accurate. For additional information, see Appendix P. 
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organize a fairly complex presentation of the evidence for 

limited, traditional definition of construct validity; that is, 
the dimensions of ability that are being Measured by the test. 

In the introduction to this report we identified and 
described two dimensions of translation ability: Accuracy and 
Expression. We discussed how these dimensions evolved from our 
efforts to develop Translation SLDs, from our research on the 
Listening Summary Translation Exam , and from our initial scoring 
of the SEVTE test papers. These two dimensions of translation 
ability were strongly supported by the results of our analyses of 
the SEVTE test data. Thus, we begin this analysis of the 
constx-uct validity of the SEVTE by stating that the test claims 
to measure overall translation ability, but that it divides this 
ability into two dimensions (Accuracy and E:cpression) and it 
claims to measure each. Accuracy is the degr^^e to which the 
information in the source document is conveyed in the target 
document. Errors in Accuracy include the misrepresentation or 
deletion of inforaation in the source document, or the inclusion 
of information that was not in the source document. Expression, 
on the other hand, focuses on the appropriateness of the language 
used in the target document. 

When a test measures two distinct dimensions, the measures 
of those should demonstrate some unique score variance. Thus, 
while the measures may be related, they should be 
distinguishable. Table 15 below presents the correlations 
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between the total scores for Accuracy and Expression for Forms 1 



Table 15 

Correlations between Mean Total Expression and Accuracy Scores 

on Fcna 1 and Fora 2 
(n « 44) 

TQTEXPFl TOTEXPrg TOTACCFl T0TACCf2 

TOTEXPFl 1.00 
T0TEXPF2 .83 1.00 

TOTACCFl .74 .63 1.00 

T0TACCF2 .75 .73 .90 1.00 

Legend: TOTEXPFl = Total Expression Score, Form 1 
T0TEXPF2 « Total Expression Score, Form 2 
TOTACCFl = Total Accuracy Score, Form 1 
T0TACCF2 = Total Accuracy Score, Form 2 

As can be seen in table 15, the correlation between these 
two total scores for Form 1 is .74, while for Form 2 it is .73. 
These moderate correlations suggest that the two subscores are 
measuring different but related abilities. This finding is 
further corroborated by examining the correlation between the two 
scores that claim to represent the Accuracy dimension and the two 
scores that claim to measure the Expression dimension. Note that 
the correlation between the Accuracy score on Form 1 and the 
Accuracy score on Form 2 is .90. Similarly, the correlation 
between the Expression total score on Form 1 and the Expression 
total score on Form 2 is also .83. These correlations between 
measures of the same dimension clearly exceed the correlations 
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between the measures of different dimensions mentioned above. 
Thus, since each measure correlates more highly with a measure of 
the same dimension than it does with a measure of a different 
dimension, it is clear that the SEVTE measures two dimensions of 
translation ability. Correlations of this nature suggest that 
one score cannot serve as a substitute for the other. Because 
individual examinees often have different ability levels on each, 
both Accuracy and Expression need to be assessed on a Spanish to 
English translation test for this population. However, because 
the two measures show moderately high intercorrelations, each 
lubscore is also a measure of the global trait being measured by 
the test. 

We will now turn to a discussion of criterion-related 
validity. This discussion provides a better understanding of the 
global trait being measured and how it relates to other relevant 
traits. 

7.3. Criterion- related Validity 

Criterion-related validity is evidence that "demonstrates 
that test scores ar< systematically related to one or more 
outcome criteria** (AERA, p. 11). For example, if supervisors 
ratings of employees' translation ability were available, then it 
would be important to see how scores on the ESVTE and supervisors 
ratings compared. Unfortunately, the Special Agent in Charge at 
each local FBI office is rarely able to rate the translation 
ability of Language Specialists or Special Agents, because a 
variety of languages may be represented in each field office. 
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Thus, an appropriate existing criterion variable was not 

available to the authors of this study. 

In an effort to remedy this situation ^ we constructed two 

concurrent measures that can serve as variable for determining 

criterion-related validity. The concurrent criterion^-related 

variables are described below. 

Concurrent Criteri on-Related Measures 

Overall FBI/CAL Expression and Accuracy Scores fEXPFBICAL 
and ACCFBICAL) . After the two raters in the validation 
study assigned analytical scores to each section of the 
production section of the SEVTE, they assigned each examinee 
two overall scores on the FBI/CAL Translation SLDs: one tor 
Expression and one for Accuracy, based on the examinee's 
performance on the Sentences and Paragraph subsections of 
the Production Section. Each examinee took two forms. 
Thus, each examinee's overall FBI/CAL Expression and 
Accuracy score is the average of four ratings (two raters by 
two different forms; . These overall FBI/CAL Expression and 
Accuracy scores were obtained for all subjects. They 
provide two measures of criterion-related validity. 



The data on the two concurrent criterion-related validity 
measures provide a basis for assessing the criterion-related 
validity of the SEVTE. Correlations between the Total Accuracy 
and Expression scores on each fomn of the SEVTE with these 
concurrent measures are presented in Table 16 below. 
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Table 16 

Correlations cf the SEVTE Scores 
with Overall Rating of Translation Ability 
(Numbers of Paired Scores in Parentheses) 

EXPFBICAL ACCFBICAL 

EXPl .88* .76* 

(44) (44) 

EXP2 .89* .75* 

(43) (43) 

ACCl .78* .89* 

(44) (44) 

ACC2 .83* .92* 

(43) (43) 

* p < .0001 

Before beginning a discussion of these relationships, it is 
appropriate to consider the validity and reliability of the two 
measures of criterion-related validity (EXPFBICAL and ACCFBICAL). 

As indicated in the description of the FBI/GAL overall 
Expression and Accuracy ratings, after scoring each paper 
analytically, the raters then referred to the FBI/CAL Translation 
SLDs to determine an appropriate holistic rating for each 
examinee based on his or her performance on the Sentences and 
Paragraphs subsections of the Production section of the test. 
This holistic rating is a rating of overall translation ability 
based on performance in translating 10 challenging sentences and 
three paragraphs varying in difficulty. Thus, this holistic 
rating ran be considered a performance-based assessment of 
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translation ability. Its validity as such is liaited slightly by 

holistic rating (two ratings on each f om) , two were awarded by 
the same rater that scored the form correlated in Table 16 with 
the holistic rating. Thus, two of the ratings are not wholly 
independent. However, the other two ratings were based on 
success at translating different texts. In this case, the 
different texts were the sentences and paragraphs appearing on 
the other SEVTE form. While one approach might have been to use 
the FBI/CAL skill level assigned by the two raters who scored the 
other form as the criterion variable (as discussed in footnote 
30), we chose to combine all four ratings from the two forms into 
a single indicator of translation skill level in this study. 
This composite rating has the advantage of being based on twice 
as many performance tasks, (20 sentences and six paragraphs) and 
twice as many ratings of translation skill level; that is, four 
ratings instead of two ratings. Thus, this composite rating of 
translation skill level can be ::onsidered to be both more 
reliable and more valid because of the number of tasks and 
evaluations (ratings) on which it was based. 

In order to determine the reliability of the criterion 
variables, i.e., the composite FBI\CAL overall rating of 
translation ability for Accuracy and Expression, a 
Generalizeability (G) study was performed on the data that went 
into the composite rating. A G study is a statistical technique 
in which the contributions of various factors (facets) to the 
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total variance of the test scores are estimated. For this 
particular study, we wanted to cstioate how nuch of error 
variance was contributed by the raters and the forms. (The 
forms are the two different samples of translation ability that 
are elicited by SEVTE Form 1 and Form 2.) There were 44 
examinees and two raters involved in the G study. Thus, both 
criterion variables (EXPFBICAL and ACCFBICAL) received four 
ratings. In our study, we wanted to estimate the 
generalizability coefficient for the average translation ability 
rating for Expression and Accuracy when two ratings on two foinns 
were used to constinact the average. The G coefficient is an 
estimate of reliability, based on the ratio of the variance of 
the objects of measurement (in this case persons) over that 
variance plus error variance due to fomns, raters, and their 
interactions. The results of the studies indicated that the G 
coefficient for the EXPFBICAL rating is .85 and the G coefficient 
for the ACCFBICAL rating is .88. These G coefficients may be 
considered the reliability of these two criterion variables. 

Returning now to Table 16, the correlations between the 
criterion variables (El PFBICAL and ACCFBIILR) and the SEVTE 
Expression and Accuracy scores are consistently high. Of the 
eight correlations shown, the lowest is .75 and the highest is 
.92. The fact that scores on the SEVTE correlate highly with 
overall translation skill level ratings strongly supports the 
validity of the two scores. 

Further analysis shows that the correlations improve as one 
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might expect. The correlation between the SEVTE Expression score 
with the Expression criterion variable (EXPFBICAL) is .88 for 
Fonn 1 and .89 for Form 2. This is strong evidence of the 
validity of the SEVTE Expression score. 

Similarly, the correlation between the SEVTE Accuracy score 
and the Accuracy criterion variable (ACCFBICAL) is high also: .89 
for Foirm 1 and .92 for Form 2. This is strong evidence for the 
validity of the SEVTE Accuracy score." 
7.3. Convergent/Discriasinant Construct Validity 



Although we chose to use the average of the four overall 
FBI/CAL translation ability level ratings here as a criterion 
variable, it is interesting to consider the correlations between 
the SEVTE Expression and Accuracy scores on one foinn and the 
overall FBI/CAL translation ability level ratings assigned by the 
raters based on the examinee's performance on the other form. In 
this case, the other form is a totally independent criterion 
variable. That is, the rating is based on the examinee's 
performance on other translation tasks like the ones that an 
examinee would have to perform on the job. 

Here the validity coefficients are also quite good. The 
correlation between the SEVTE Expression total based on Form 1 
and the average of the two overall FBI/CAL translation skill 
level ratings assigned based on Form 2 Sentences and Paragraphs 
is .83. Similarly, the ^correlation between the Expression total 
based on Form 2 and the average of the two overall FBI/CAL 
translation skill level ratings assigned based on Form 1 
Sentences and Paragraphs is .81. 

The correlation between the SEVTE Accuracy total based on 
Form 1 and the average of the two overall FBI/CAL translation 
skill level ratings assigned based on Form 2 Sentences and 
Paragraphs is .83. Similarly, the correlation between the 
Accuracy total based on Form 2 and the average of the two overall 
FBI/CAL translation skill level ratings assigned based on Form 1 
Sentences and Paragraphs is .81. 

Again, it must be remembered that these overall FBI/CAL 
translation skill level ratings are less reliable than those 
included in table 16. The G study showed the G coefficient with 
one form and two ratings to be .77 for EXPFBICAL and .79 for 
ACCFBICAL. 
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Because the evidence in Table 16 so clearly supports the 
validity of the SEVTE as a measure of Spanish-English translation 
ability, a fuller discussion of evidence for the construct 
validity of the test is warrented. Such a discussion can be 
obtained by considering the convergent/discriminant nature of the 
correlations between the SEVTE and other measures theoretically 
related to the construct of interest. In such a discussion, an 
expected correlation of the test with each variable is analyzed 
and discussed. Some criteria will be expected to show a strong 
relationship with the test whose validity is being examined, 
while other criteria will be expected to show a weak correlation, 
or to not correlate at all, or even to correlate negatively. We 
will make use of the convergent/discriminant validity approach 
here in order to fully examine the construct validity of the 
SEVTE. 

In an effort to attain further understanding of the 
construct measured by the SEVTE, two concurrent measures were 
collected. These concurrent measures are described below. 



1. A s-alf -rating (SPENSELF and ENSPSELF) . CAL developed two 
questionnaires that asked subjects a) with what types of 
documents they had experience translating from Spanish into 
English and English into Spanish; and b) if they had 
experience, to rate their translation ability of these 
documents as either •^Limited,** •'Functional," "Competent," 
or "Superior." These questionnaires were administered to 
the subjects immediately preceding the administration of 
the first part of the corresponding test. A copy of these 
questionnaires is contained in Appendix N. Each subject's 
responses to these two questionnaires were converted into 
self-rating scores (Spanish into English = SPENSELF; 
English into Spanish = ENSPSELF) by first awarding points 
to each item that subject rated (1 for "Limited, 2 for 
"Functional," 3 for "Competent," 4 for "Superior," with N/A 
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receiving no value) and then calculating the nean response 
to all items for which he or she provided a self^rating. 

In addition, data were collected, where available, on six 

nonconcurrent tests that had been administered within one to 

eight years of the study. 



Previously Admini stered Measures 

1. A Spanish OPT score fSPAWSPK) . An oral proficiency 

interview (OPI) score for Spanish was collected for as 
many subjects as possible. Although this is not a 
wholly adequate criterion variable, it is relevant to 
translation ability. Speaking proficiency assumes and 
is moderately correlated with Spanish reading 
proficiency. Correlations between the two skills 
typically are between .50 and .TS. Thus, on a 
theoretical basis, it was decided that the OPI score 
could be used to provide additional evidence of 
criterion-related validity. For all ILR scores in 
this study, the following conversion was used for 
purposes of empirical analyses: 
ILR Score Numerical Score 



0+ 


0.8 


1 


1.0 


1+ 


1.8 


2 


2.0 


2+ 


2.8 


3 


3.0 


3+ 


3.8 


4 


4.0 


4+ 


4.8 


5 


5.0 



2. Other test scores . Other scores that measure possibly 
related constructs were collected as possible. None 
of these scores could be collected for all the 
subjects, however. These scores, the number of 
subjects for which they were collected, and their 
descriptive statistics are given below, together with 
the same information on all of the measures. 



97 



Mean 



Std. Dev. MinixauioMaxiinuin 



EXPFBICAL 44 


2.86 






0.67 


1.304.65 


ACCFBICAL 44 


2.58 






0.72 


0.904.25 


SPENSELF 


43 


2.89 






0.67 


1.04.0 


ENSPSELF 


35 


2.90 






0.62 


1.04.0 


SPANSPK 


36 


4.14 






0.98 


2.05.0 


DLPTLIST 


28 


52.75 


5. 


06 




3960 


DLPTREAD 


28 


53.25 


6. 


54 




3060 


ENGSPK 


17 


4.21 


0.60 




3.05.0 


SPENTRAN 


17 


3.45 






0.96 


2.04.8 


ENSPTRAN 


17 


3.29 






0.65 


1.84.0 



Key 

EXPFBICAL Overall ILR expression score. 
ACCFBICAL Overall ILR accuracy score. 

SPENSELF Average score on the Spanish into English Verbatim 
Translation Ability Self Assessment Questionnaire. 

ENSPSELF Average score on the English into Spanish Verbatim 
Translation Ability Self Assessment Questionnaire. 

SPANSPK An OPI score for Spanish. 

DLPTLIST The listening section of the Defense Language Institute 
Placement Test* Maximum possible score « 60. 

DLPTREAD The reading section of the Defense Language Institute 
Proficiency Test. Maximum possible score « 60. 

ENGSPK An OPI score for English. 

SPENTRAN An ILR score on the current FBI Spanish into English 

verbatim translation exam. 
ENSPTRAN An ILR score on the current FBI English into Spanish 

verbatim translation exam. 



Relationships between scores on these measures and scores 
on the SEVTE were calculated in order the examine the 
convergent/discriminant validity of the SEVTE. 
7.4.1. Convergent Validity 

Correlations between the Total Accuracy and Expression 
scores on each form of the SEVTE with the concurrent measures are 
presented in Table 17 below. (Note that the SEVTE scores in this 
table represents a composite of the two ratings. In addition, 
examinees were not penalized if they did not attempt a paragraph 
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due to lack of timeO The number of subjects involved in the 
correlation is also given, since not every subject had a score on 
every measure; i*e*, the numbers in parentheses represent the 
number of subjects who had a score on both measures being 
correlated* The magnitude of the Ns should be considered in 
making interpretations* Larger Ns allow a greater degree of 
confidence in the indicated relationship* In general, none of 
the Ns are large, suggesting that the correlations should not be 
considered stable* 



Tabte 17 

Corretatiorift of tK« SEVTC Scotm 
Mfth Other Avttftabte McMurtt 
(Nuifc«rt of Paired Scores in PerentheMt) 





SPENSEir 


ENSPSEir 


SPANSPK 


OLPTLtST 


OLPTIEAD 


ENGSPK 


SPEMTRAN 


ENSPTRAN 


EXPl 




.w 


.04 




.45* 


.50* 


.50* 


.49* 






(35) 


(36) 


(28) 


(28) 


(17) 


(17) 


(17) 


EXP2 


.28 


.25 


-.07 


.43* 


.30 




.50* 


.50* 






iU) 


(35) 


(27) 


(27) 


(17) 


l17) 


(17) 


ACC1 




.42* 


.47* 


,76* 


.70* 


.47 


.57* 


.75» 




(43) 


(35) 


(36) 


(28) 


(28) 


<17) 


(17) 


(17) 


ACC2 


.59* 


.SI* 


.56» 


.6?* 


.60* 


.53* 


.46 


.68* 




(42) 


(54) 


(35) 


(27) 


(27) 


(17) 


(17) 


(17) 



• p < .05 



We will now discuss the relationships in the Table 17, 
refering again, when appropriate to the data in Table 16. The 
accuracy of this discussion is tempered by the fact that no 
reliability statistics are available on any of these criterion 
measures. Even though this is the case, since this is the only 
data available, there is no other option than to examine and 
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interpret the suggested relationships. Since the magnitude of 
these relationships is attenuated to the extent that the tests 
are less than perfectly reliable, one can generally assune that 
the relationships are at least as strong as are indicated here. 
On the other hand, the reliability of the SEVTE scores does not 
pose a problem, since all the SEVTE reliabilities are quite high. 
(See sections 6.2 and 6.3.) 

First, it is most notable that there were moderate 
correlations, most of them significant, between the SEVTE Total 
Accuracy score and all the criterion variables. The correlations 
between the SEVTE Expression score and the criterion variables 
were usually not as high as the correlations for the Accuracy 
score, and they are not always significant. This supports the 
central ity of the Accuracy score in the measurement of 
translation ability. 

Accuracy is the degree to which the infomnation in the 
source document is conveyed in the target document. Errors in 
Accuracy include the misrepresentation or deletion of information 
in the source document, or the inclusion of information that was 
not in the source document. Expression, on the other hand, 
focuses on the appropriateness of the language selected for use 
in the target document. 

In the tables above, we would expect a positive 
correlation between the SEVTE Accuracy score and the Spanish into 
English self-assessment of this ability (SPENSELF) . These 
correlations, depicted in the left column of Table 17 above, are 
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.63 for Form 1 and .59 for Form 2. These moderately strong 
correlations support the validity of the SEVTE Accuracy score. 
The lower correlations between SPENSELF and SEVTE Expression (.43 
and .28); suggest that factors other than the ability to 
translate the information, i.e., English writing ability, may 
play a larger role in the Expression rating. Again, no data are 
available on the reliability of the SPENSELF questionnaire.'^ 



The question of the reliability of the questionnaires used 
to calculate each subject's self^assessment score deserves some 
comment here. When dealing with the internal consistency 
reliability of a measurement instrument, the estimated 
reliability coefficient is an indication of the extent to which 
items comprising the measure are tapping into the same underlying 
trait or ability. This assumes that each item was written to 
measure this trait or ability, and that all examinees would 
answer all items. 

The nature of the two questionnaires from which self-* 
assessment scores were calculated here was somewhat different in 
that each subject gave a self -rating only to a subset of the 
"items." These "items" were the document types with which he or 
she had experience. In the vast majority of cases, subjects did 
not have experience in translating all the document types; thus, 
self-rating scores were sometimes based on only 3 or 4 responses. 
The response on the other "items" was "Not Applicable," to which 
no reasonable numerical value could be assigned; ••Not Applicable" 
means that the subject does not translate such document types. 

When missing data occurs in a questionnaire database, 
there are several ways to deal with the problem under certain 
circumstances. Inadvertently missing data may be replaced by an 
estimate of that subject's response to the item, such as using 
his or her mean score on items answered or the mean response of 
all subjects answering tnat item. On certain measures, such as 
on an attitudinal questionnaire, a missing value may be 
appropriately interpreted as the subject's having no opinion or 
not caring about the issue in the item, and a missing value can 
then be replaced by a neutral response. 

Had we been able to treat these responses as missing data, 
there would have been several ways to estimate the reliability of 
the two questionnaires. However, on the questionnaires used 
here, a response of "Not Applicable" is not missing data. To 
replace these responses with a numerical value (such as the 
subject's mean response) is contrary to the subject's own rating 
of "Not Applicable" to that "item" (document type). Furthermore, 
even if it were appropriate to treat the response as missing 
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The correlations between the SEVTE and the self-rating of 
ability to translate each of trie 12 types of doeu&enti included 
on the SPENSELF questionnaire are found in Appendix N. Given the 
relatively small proportion of Language Specialists in the 
sample, it is possible that most examinees did not have such 
experience translating such documents on the job. An attempt was 
made to correct for this in the design of the questionnaire by 
telling people in the instructions, ••If you have never translated 
a particular type of document, please mark N/A (not applicable) • •* 
While almost all subjects completing the questionnaire (43) 
indicated that they translated correspondence (letters) (98%) , 
the mean number of documents responded to of the 12 dociiment 
types was 7.79. While all document types received at least a 47% 
response, the average examinee responded N/A to about a third of 
the document types. Thus, it may be inferred that translation of 
documents other than letters is performed rarely by most 
examinees and consequently that most examinees may have not have 
had a valid basis for making judgments of their ability. 

It is worthwhile to consider the correlations between 
SEVTE scores and the self-ratings of ability to translate the 12 



data, making a large number of replacements as would be required 
here, would inflate reliability by increasing interitem 
consistency in proportion to the number of responses of "Not 
Applicable" that were replaced by each subject's mean response. 
The resultant estimate of reliability would thus be spuriously 
hign and it would not be interpretable . 
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document types included on the Spanish-English Self -Assessment 

Accuracy score for Forms 1 and 2 and the 12 document types were 
significant. The correlations ranged from .74 to .42« The 
highest correlations were with the ability to translate foreign 
diplomatic reports (.73 and .74)/' depositions (.73 and .72), 
foreign counter-intelligence status/evaluation reports (.65 and 
.57), correspondence (.59 and .64), letters rogatory (.54 and 
.62), police reports (.56 and .56), and news editorials (.57 and 
.51). These correlations, individually and as a whole, provide 
evidence of the convergent validity of the SEVTE Accuracy score. 
The fact that the correlations are so similar for the two foinns 
also bodes well for the comparability of the two forms. That is 
to say, they appear to measure the same construct." 

Another overall measure of translation ability is the 
FBI's current Spanish to English translation test (SPENTRAN) . 
(See column 7 above.) The SEVTE Accuracy and Expression scores 
correlated moderately with this test (.48 to .57) for the 17 
examinees for whom scores on this test were available. One must 
remember that the FBI is not satisfied with the reliability and 



^^The first correlation in parentheses is with the Accuracy 
score for Form 1 and the second is with the Accuracy score for 
Form 2. All of the correlations and the Ns on which they are 
based are available in Appeniix N. 

"The correlations between the 12 document types and the 
SEVTE Expression score were lover and less than half were 
statistically significant. 
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validity of this test." Thus, the lack of a high correlation 
with the SPENTRAN should not be a source of concern. Under the 
circumstances, the magnitude of this correlation is acceptable. 
Theoretically, the ability to translate froa Spanish to 

English should require reading ability in the language of the 
source document, which is Spanish. The measure of Spanish 
reading ability used here was the DLPT Reading subtest. The 
SEVTE Accuracy score showed moderately high correlations (.70 and 
.60) with the DLPTREAD, which indicates that it is sensitive to 
Spanish reading proficiency. One would expect the SEVTE 
Expression score to be less related to Spanish reading ability. 
The Expression correlations with DLPTREAD (.45 and .30) show that 
this was indeed the case, and in the case of Form 2, the 
correlation was not significantly different from zero. 

Another measure of Spanish ability available was the 
Spanish OPI score (SPANSPK) . There was a moderate correlation 
(.47 and .36) between SPANSPK and the SEVTE Accuracy, confirming 
that Spanish language ability is related to the ability to 
translate information from Spanish to English. However, there 
was no correlation (.04 and -.07) between SPANSPK and SEVTE 
Expression. This indicates that Spanish spea)cing ability is not 
related to the ability to translate a Spanish language text using 
appropriate English written expression. This is as expected, and 
supports the use of two separate scores for the SEVTE. 

^*No evidence of the reliability of this test has ever been 
gathered. 
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English proficiency should also be necessary to translate 
from Spanish to English. The only measure of this proficiency 
available was the English OPI score (ENGSPK) . The correlation 
between English speaking proficiency and SEVTE Accuracy (.47 and 
•53) was about the same as it was for Spanish speaking 
proficiency. In addition, the ENGSPK correlation with Expression 
(.50 and .51) is about equal in magnitude to its correlation with 
Accuracy, suggesting that both SEVTE scores are related to 
English proficiency. It may be noted here that whereas SPANSPK 
was not correlated to total Expression scores, ENGSPK was. This 
is understandable, since English speaking ability can be expected 
to correlate with English writing ability, whereas Spanish 
speaking ability would not be expected to correlate with English 
writing ability. 
7c4.2. Discriminant Validity 

Another criterion-related approach to establishing 
consti-uct validity is to consider all the measures as a whole and 
contrast the correlations. First, one begins with the measures 
that one would expect to show a low correlation with the SEVTE. 
Then, one contrasts these measures with the correletions for the 
measures that one would expect to correlate more highly with the 
SEVTE. If the correlation with the variables expected to be more 
relevant is indeed greater, then this is evidence of discriminant 
validity. Thus, one examines the magnitudes, the differences, 
and the direction of the differences of the correlations^ to see 
if they fullfill a priori expectations. This process establishes 
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the discriminant validity of the test under consideration. Using 
this approach, the data fron the validation study is usually, 
although not always, supportive of construct validity of the 
SEVTE as a test of Spanish to English translation ability. 

First, we will begin by comparing the SEVTE with the two 
concurrent criterion-related validity variables in Table 16. 
These variables are the composite rating of translation skill 
level assigned by the raters after analytically scoring the 
production section of the test. In Table 16, we see that SEVTE 
Exp>-«^ssion score correlates more highly with the translation 
skill level for Expression (EXPFBICAL) than it does with the 
translation skill level for Accuracy (ACCFBICAL) (.88 and .89 
versus .76 and .75). We also see that the SEVTE Accuracy score 
correlates more highly with the translation skill level for 
Accuracy (ACCFBICAL) than it does with the translation skill 
level for Expression (EXPFBICAL) (.89 and .92 varsus .78 and 
.83) . 

Second, we will compare the SEVTE with other measures of 
Spanish-English translation ability. The self assessment 
questionnaires (SPENSELF and ENSPSELF) conpleted by examinees 
prior to the exam are two such measures. One would expect to 
find a stronger relationship betv jen SEVTE scores and the 
SPENSELF than between the SEVTE scores and the ENSPSELF, si..ce 
the ENSPSELF is a measure of translation in the opposite 
direction. Columns one and two in Table 17 indicate that this 
turned out as expected. All four of the SPENSELF correlations 

106 



108 



are larger than the corresponding ENSPSELF correlation. 

Two other such measures are the FBI's current translation 
tests (SPENTRAN and ENSPTRAN) . One would expect a stronger 
relationship between the SEVTE and the SPENTRAN, since both 
purport to measure the ability to translate in the same 
direction. Such an outcome was not found, however. In two out 
of four comparisons, the ENSPTRAN showed the stronger correlation 
and in two cases there was essentially no difference. Again, one 
must remember that these current FBI t<2sts are considered to have 
limited validity. 

Another issue is the relative importance of the two 
languages to the two scores. One would expect the SEVTE 
Expression score to be more strongly related to English 
proficiency than to Spanish proficiency, since, on the SEVTE, the 
examinee actually performs in English. The one measure of 
English proficiency available is ENGSPK and the three measures of 
Spanish proficiency available are SPANSPK, DLPTLIST, and 
DLPTREAD. The SEVTE Expression score shows a far greater 
correlation with ENGSPK (.50 and .51) than with SPANSPK (.04 and 
-.07), vhicn is a measure of the corresponding skill (speaking). 
The direction of the difference is as one would expect. SEVTE 
Expression also shows a higher correlation with ENGSPK than with 
DLPTREAD (Spanish reading) (.45 and .30), which is also as one 
would expect. However, the SEVTE Expression correlation with 
DLPTLIST is about equal to the correlation with ENGSPK, even 
though one would expect it to correlate higher with ENGSPK. 
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There is no explanation why the correlation with DLPTLIST was so 
high, since translation does not involve listening. Again, one 
must remember that the sample size for this correlation was small 
(N-17) , and that correlations based on small Ns can vary greatly 
from the true correlation. 

Similarly, one would expect the SEVTE Accuracy score to be 
more strongly related to proficiency in Spanish than is 
Expression. The data for the three measures of Spanish 
(SPANSPK, DLPTLIST, DLPTREAD) show this to be the case. It* fact, 
the difference in the correlations for Accuracy and Expression is 
ft.r greater on these measures of Spanish than for three other 
measures in Table 17, namely ENGSPK, SPENTRAN, and ENSPTRAN." 

Similarly, since Accuracy, theoretically involves both 
languages about equally, one would expect fairly similar 
correlations between Accuracy on corresponding measures of 
proficiency in both languages. A comparison of the correlations 
with oral proficiency in the two languages, which is the only 



"Accuracy requires the correct comprehension of the Spanish 
language propositions, whereas Expression does not. That is, one 
can score high on Expression and still not render an accurate 
translation. 

"it is interesting to note that the self-ratings of 
translation ability, SPENSELF and ENSPSELF, also exhibit a 
similar difference in their correlations with SEVTE Accuracy and 
Expression, whereas the FBI's previous measure, SPENTRAN, does 
not exhibit any differential in the magnitude of its correlation 
with SEVTE Accuracy and Expression. This suggests that SFLWTRAN 
seems to measure both constructs equally. On the other hand, 
ENSPTRAN does correlate fflore highly with SEVTE Accuracy than with 
Expression, suggesting that it focuses on accuracy, or that 
accuracy plays a more important role in the ENSPTJcAN than in the 
SPENTRAN. 
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measure for which corresponding scores are available in the two 
languages, shows that the correlations between Accuracy and 
SPANSPK and b'.^tween Accuracy and EMGSPK ire equal for Form 1 but 
not equal for Form 2. For Form the correlation between SEVTE 
Accuracy and ENGSPK was slightly higher. 

It was indicated earlier that Accuracy is the principal 
measure of translation ability while Expression focuses on the 
appropriateness of the usage in the target language document. 
Thus, one would expect higher correlations with the criterion 
variables for Accuracy than for Expression, which was also found 
to be true. The exception to this expectation would be the 
criterion variable that assesses English proficiency. Here, one 
would expect to find Expression correlating at least as high as 
Accuracy, and perhaps higher. An examination of the SEVTE 
Accuracy and Expression correlations with ENGSPK in Table 16 
shows this expectation was met. Accuracy correlates .47 and .53 
with ENGSPK and Expression correlates .50 and .51. Thus, the 
correlations with ENGSPK are equal. 
7 •4. Conclusions 

From this discussion of the construct validity of the 
SEVTE through the examination of criterion-related, convergent 
and discriminant relationships with other measures, four 
conclusions can be reached. 

First, SEVTE Accuracy and Expression measure different 
constructs. While the two constructs are correlated, the 
correlations (.74 to .75) are far from perfect. Thus, neither 
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score can serve as a substitute foT the other. The fact that a 
person can translate information accurately froo Spanish does not 
mean that he or she can e)cpress it appropriately in English. 
Similarly, the fact that a person can express a translation 
appropriately in English does not mean that the information is 
accurate. 

Second, both SEVTE Accuracy and SEVTE Expression appear to 
be valid measures. Both were found to correlate highly with 
translation skill levels assigned by comparing direct 
translations to the FBI/CAL translation skill level descriptions. 
SEVTE Accuracy was found to correlate with self-gratings of 
ability to translate various kinds of Spanish language documents 
on the job, with the FBI's current translation tests, with scores 
on all language proficiency tests, including measures of Spanish 
listening, speaking, and reading, and English speaking. 
Expression was found to correlate with all of the above measures, 
except Spanish speaking. 

Third, Accuracy is the central construct. That is. 
Accuracy is the more valid measure of translation ability. In 
this study. Accuracy showed moderate to moderately high 
correlations with all the criterion variables. Expression is not 
as highly nor as consistently correlated with the criterion 
variables as Accuracy- Thus, Expression can be viewed to 
represent a secondary, although still important, construct in 
translation • 

Fourth, an analysis of discriminant validity provides 
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additional, generally positive evidence for the validity of both 
Accuracy and Expression. The SEVTE Accuracy neasure correlates 
more highly with the FBI/CAL translation skill level for Accuracy 
than with the FBI/CAL translation skill level for Expression. 
The SEVTE Expression measure correlates nore highly with the 
translation skill level for Expression than with the translation 
skill level for Accuracy. Both measures correlate more highly 
with self ratings of Spanish-English translation ability than 
with self ratings of English-Spanish translation ability. 
However, similarly clear evidence was not found in the 
correlations witn the FBI's current tests of translation ability. 

Finally, the SEVTE correlations with the various measures 
of language proficiency pemnit three additional conclusions about 
the role of various language skills in each SEVTE score. 

First, English, the target language, plays a greater role 
in the Expression score than does Spanish, the source language. 
In this study, there was one measure of English proficiency and 
three measures of Spanish proficiency. The one English 
proficiency measu.-'e showed a greater correlation with Expression 
than did the Spanish measures. 

Second, Spanish and English (the target and source 
languages) play approximately equal roles in the Accuracy score. 
In this study, all four language measures showed moderate to 
moderately high correlations with Accuracy. For the one skill 
where there were corresponding measures in both languages 
(speaking), the correlations were equal for Spanish and English 
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on Form 1, but not equal for For© 2. 

Third, Spanish, the source language, plays a greater role 
in the Accuracy score than in the Expression score. The data 
here showed that Spanish correlated higher with Accuracy than 
with Expression for the three skills measured (Spanish speaking, 
listening, and reading). 

These conclusions about the role of proficiency in the two 
languages in the various scores provide additional insights into 
the skills required for Spanish into English translation. 
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8. Construction of Translation Skill Laval Scora Convarsion 

This section describes the construction of tablas to 
convert raw scores on the SEVTE for Expression and Accuracy to 
FBI/CAL Translation Skill Levels (TSLs) . In order to Bake 
decisions on the basis of test scores, compare test scores across 
forms, and interpret test scores, raw scores on the SEVTE must be 
converted to TSL scale scores ♦ 
B.le ovarviav 

In most of the preceding discussion of the SEVTE, raw 
scores have been used." However, one of the goals of the 
project was to be able to interpret test scores in a way that is 
grounded in the Translation Skill i^evel Descriptions.'* This 
entailed the construction of raw score-to-TSL score conversion 
tables for Exprassion and Accuracy for each section and each form 
of the test. These are presented in Appendix 0. 

Construction of the scaled score conversion tables is an 
attempt to give interpretative meaning to the SEVTE raw scores. 
In addition, it enables the comparison of total scores across 
forms and, to an extent, across the Multiple Choice section on 
the two forms. Conversion into scaled scores takes into account 



^Weighted scores were used for many of the correlations 
involving Form 2 Expression scores. 

"The Statement of Work in the RFP issued by the FBI for 
this project called for the development of a test ••which would 
ultimately result in a score which can be converted to the 0 
through 5 scale." 
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differences in test difficulty. Thus, a comparison of results 
across test forms and subtests aust only be nade in terns of the 
TSL scores. 

8.2* Determining Contributors to Expression and Aociurac; Total 
Scores 

Given the format of the test and the scoring system, there 
was a total of 185 possible points on the test when all the 
subscores were added together. However, after the data was 
collected, it became apparent that there should be separate 
scores for Ek^^ression and Accuracy. (See the discussion of the 
the history of the SLDs and the discussion of the constructs in 
sections 1.4.1. and 1.5.3.) Based on our conceptualization of 
the constructs, it was clear that scores for paragraph expression 
(PEX), paragraph grammar (PGR) and paragraph mechanics (PME) 
should contribute to the total Expression score, while sentence 
accuracy (SAC) and paragraph accuracy (PAC) should contribute to 
the total Accuracy score. To detemnine to which score the 
Multiple Choice (MC) section and the Word and Phrase Translation 
subsection belonged, a multiple-regression "r-scjuare" analysis 
was perfoirmed. An r-square analysis determines the r-square 
value (percent of variance shared by the combination of the 
variables with the criterion) of all combinations of the 
variables entered into the equation when regressed on the 
criterion (overall EXPFBICAL and overall ACCFBICAL) . Both MC 
scores and Word and Phrase Translation scores were entered into 
the r-square analysis together with PEX, PGR and PME, using the 

114 

JIG 



overall FBI/CAL Expression score as a criterion. In addition, 
both HC scores and Word and Phrase Translation scores were 
entered into the r-sguare analysis together vith SAC and PAC, 
using the overall FBI/CAL Accuracy score as a criterion. The 
results of all the r-sguare analyses (Expression and Accuracy 
scores for the two forms of the SEVTE and the two forms of the 
ESVTE) were examined together. The results indicated that, 
although MC and Word and Phrase Translation scores contributed to 
both Expression and Accuracy scores, the most parsimonious 
combination of scores was for MC co be used as a subscore for 
Expression and Word and Phrase Translation as a subscore for 
Accuracy. 

Once these combinations of subscores were determined, we 
examined whether there was anything to be gained by 
differentially weighting the different subscores to produce the 
total score. Regressions were man to determine the maximum 
amount of variance shared between the optimal combination of 
subscores and the corresponding criterion variable. These were 
compared to forming total scores without differential weighting. 
This analysis revealed that little was to be gained by weighting 
in all cases except the total Expression score for Form 2 of the 
SEVTE. The correlation with the FBI\CAL translation skill level 
rating for Expression were significantly imoroved by the 
assignment of different weights to the Form 2 Expression 
subsections. Thus, the weights for Form 2 Expression were set as 
follows : 
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Total Form 2 Expression « .289 x Form 2 MC + 

1.920 X Form 2 PGR + 
.456 X For© 2 FME 4- 
3.466 X Form 2 PEX. 

This combination of weights indicates that paragraph 

expression and paragraph grasmiar receive greater emphasis while 

paragraph mechanics and the total multiple choice section scores 

receive lesser emphasis than in the Form 1 total Expression 

score, which is scored solely on the basis of raw score points. 

SEVTE Form 2 was the only one of the six test forms developed as 

part of this project that profited significantly from 

differential weighting. 

8.3. Development of Raw Score to Scalod Score Conversion Tables 

Since one of the goals of the project was to provide 
translation ability scores based on the TSL descriptions, it was 
necessary to identify a procedure that would anchor SEVTE scores, 
which are analytical, to the holistic TSL descriptions. This was 
accomplished during the validation study (see section 7.2) by 
having each rater assign to each paper, separately for Expression 
and Accuracy, a translation proficiency skill level based on the 
FBI/CAL translation skill level descriptions. This procedure 
produced in four holistic ratings for Accuracy and four holistic 
proficiency ratings for Expression. These two sets of four 
holistic proficiency ratings were then averaged separately, to 
give each examincie an overall FBI/CAL TSL score for Expression 
and Accuracy. 
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To develop a conversion table of raw SEVTE scores to TSL 
scores, total raw scores for Expression and Accuracy for all 
subjects were averaged betwe^^n raters, with the Expression score 
for Form 2 being weighted. These total raw scores were then 
regressed on the corresponding overall FBI\CAL translation skill 
level (Expression or Accuracy) . As shown in Table 15, 
correlations between the total SEVTE scores and these overall 
scores were very high: from .85 to *89 for Expression and from 
.89 to .92 for Accuracy. These high correlations produced 
optimal regression equations for predicting TSL scores from raw 
scores on each form of the test. These equations were then used 
to produce predicted TSL scores from all possible SEVTE scores 
for each form." These conversion tables are presented in 
Appendix 0. 

8.4. Using the Multiple Cboica Section as a •^Screen** 

The Multiple Choice section of the SEVTE may be used to 
screen out individuals for whom the production section of the 
test is inappropriate. Section 2.4 of this report describes how 



"for a considerable number of examinees on each form of the 
test, this regression line resulted in a perfect prediction. 
That is, the overall TSL rating predicted by applying the 
regression line to the raw score (or weighted score in ty^. case 
of Form 2 Expression) coincided exactly with the average TSL 
rating assigned by the rater. However, there was a tendency 
toward greater error among examinees who scored higher on the 
SEVTE. This was due to a number of causes, including the 
regression effect, sampling, and the speededness of the Paragraph 
Translation subsection during the validation study. For 
additional information on the accuracy of predicted Translation 
Skill Levels see CAL's memo to the FBI dated May 15, 1990, in 
Appendix P. 
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it was determined to use the multiple choice section £core as a 
screen* The Multiple Choice score selected (mentioned below) is 
the best predictor of a TSL rating of 2.0 on the coabined 
multiple-choice and production sections of the SEVTE. Examinees 
who score below this level are unlikely to score a 2.8 (2+) or 
above on the total test after their raw score has been converted 
to the corresponding TSL score for Accuracy. The SEVTE total 
score corresponding to a TSL of 2+ is the recommended passing 
score; that is, the score at which examinees can serve as 
translators for the FBI. 

In using the SEVTE MC as a screen, the most serious error 
one can make is to exclude someone from caking the Production 
section who may ultimately score a 2+ or above. Giving the 
Production section to someone who may not ultimately score 2+ or 
above is not a serious error, since this individual will 
ultimately be evaluated correctly (after the production section 
is scored). To determine the cut-off score on the Multiple 
Choice section, we need to determine the raw score on the 
Multiple Choice section that corresponds to a TSL score of 2; 
that is, we need to determine the raw score on the MC section 
that corresponds to a translation proficiency level of 2 for 
Accuracy. 

To determine the raw score on the MC section that 
corresponds to a score of 2, raw scores on the MC section were 
regressed on the overall Accuracy scores. (Note that for Form 1 
the correlation between these two scores was .76; for Foinn 2 it 
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was .69. The root mean square «rror of the regression for Form 1 
was c470 of a level; for ForiB 2 it was •492*) This analvses 
revealed that the score of 33 would be the lowest predictor of a 
score in the 2 range on Form 1, while 25 would be the lowest 
predictor of that score for the more difficult Form 2. These, 
then, are the recommended cut-off scores on the Multiple Choice 
section. Examinees who score below this level on the Multiple 
Choice section of the SEVTE either need not take the production 
section, or if they already have, that section need not be 
scored. 

Using these cut-off scores would still leave in aany 
examinees who may not ultimately achieve a score at or above 2+ 
in Accuracy on their total test; however, the probability of 
excluding a candidate who might achieve a 2+ in Accuracy on the 
total test is minimal. 
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APPENDIX A 



ADMINISTRATION INSTRUCTIONS FOR SEVTE 
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TEST ADMINISTRATION INSTRUCTIONS 



SPANISH INTO ENGLISH VERBATIM TRANSUTION EXAM 




NOTE TO TEST ADMINISTRATOR 

This nvanual deKribec fmporfAitt Infonnttion about the 
proctdores that mast be followed BEFORE, DURING, and AFTER tbe 
adoiioistnitioD of tbe translatioa exams. UBifonn procedures are 
esseotial for tbe translatioa exams to yield reliable lest results. The 
scores of all examinees from various field ofGces fa tbe nation will be 
comparable only if all test administrators follow tbe same procedures 
and give exactly tbe same (nstrvctJoas. It is aecessaiy, therefore, that 
you read tbe entire manual before admlnisterint the exams and follon 
the instructions without exception when admlnisterint exams. 
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GENERAL ?NFORMATION 



Test Security 

It is extremely iinportant that xht translation exams be safeguarded and 
administered under secure conditions at each field office. In order to esksure test 
security, it is essential that you adhere to the following conditions: 

1. Keep all test materials either in your immediate physical possession or in a 
locked cabinet or other secure firea under your control. 

2. Do not copy, or allow others to copy, any portion of the test booklets or tape, or 
make any notes or transcriptions of the test bookleu c ipe content. 

3. Allow only those particular individuals who are to be tested to see the test 
materials, and onty at the time of test administration and under the specific 
procedures described in this manual. 

4. Should any irregularities occur, report them on the Test Administrator Report 
Form included in the test package. Please complete and sign this form even if 
no inegularities occur. 



PRIOR TO THE TESTING DATE 

A^semhlmg Test Matcriak 

Assemble as many test booklets and answer sheets as will be needed for the test 
administration, including tv.o or three extra copies of each. You should also have on 
hand at least ruo no. 2 pencils (vwiih erasers) for each examinee. Listed belov^ arc the 
materials needed for each exam: 

1) Multiple Gioice Section test booklets 

2) Production Section test booklets 

3) Answer sheets 

4) No. 2 pencils 

5) A timer, uTist\^'atch or other timepiece which can be reset 



Arrancmg for a Testing Sue 

Locate a testing site that is comfortable and free from distraction. The tesiirc 
room should be large enough so that examinees can be seated \with three feet of space 
in all directions betv^een all examinees 
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ON THE TESTING DATE 



Check to make sure the timepiece is functioning properly and has been 
completely reset to icro (or 12:00). There should ahvays be at least two timepieces in 
the testing room as a check against mistiming. 

Prohibited Materials 

While taking the Multiple Choice Section and the Translation of Words and 
Phrases in Context and Sentence Translation Section, examinees should not have 
anything on their desks except their pencils, test bookleu, and answer sheets. 
Examinees may use d-ctionaries cnly during the Paragraph Translation Section. 

Administeri ng the Test 

Follow the procedures below when adminisiering the test. All instructions within 
the boxes should be read verbatim . Pause where four dots appear to allow time for the 
procedure described to be carried out. Be sure you state the corrca form where 
appropriate. Do not depart from these directions unless noted otherwise. 

1. After al! examinees have been seated, distribute the Multiple Choice Section test 
booklets, answer sheets, and pencils. 

2 Give the following instructions: 



Please do not open your test booklet In this section of the exam, you will 
mark all of your answers on the ans>»«r sheet Do not ^rite anylhing In the lest 
booklet Vou must use a no. 2 peDcil for marking yoor answers. 
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3. In'.truct the examinees how to fill out the answer sheet: 



Piatt your aarwtr sheet on top of jmir lest booMet Tan Iht «nnrer ibeet to 
that yoo lee SIDE ONE fn the opper rif ht kaod corner^ , 

On the left half of side one, you will see «n area cootalninf Une Ones. At the 
top of thii section Is the word NAME. Print yoor name In the boxes provided. 
Print jonr last name, and then jfour llist same. Leave a Mank sspace between 
your last name and joor first name~ 

Now fill in the circles beneath the boxes In wbidi ym printed yoor name. 
Each circle yon fill in most correspond to the tetter you printed in the box above. 
Be sore that you darken the circle so that the letter within the drvle is completely 
€o>Tred. Yoo should not be able to see the letter. If you make a mistake, erase 
the mistake completely. Do not make any extra marks on your answer sheet 
Youf answer sheet will be scored by a nucUne. If yoo do not mark It carefully, It • 
mcy not be processed accurately by the aoorinf madiine. 

Now find the section labeled fDENTinCAIlON NUMBER in the bottom left 
half of your answer sheet Print your SOCIAL SECURITY NUMBER in the boxes 
labeled A throufh I_ 

Now fill in the circles beneath the boxes in which yoo printed your social 
security number. Each circle yoo fill in roost correspond to the number you 
printed in the box above 

Now find the section labeled SPECIAL CODES, located to the right of the 
section yoo jost completed. (GIVE THE FOLLOMNG INSTOUCTIONS IN 
ACCORDANCE VaTE THE rOR.M ^'U^^BER OF THE EXAM YOU ARE NOW 
AD>aN'lSTERING:] Print the number [ONT or TWO] In box K TTils is 
[FORM I or FORM 2) of the Spanish into EoflUh Verbatim Translation exam. 
You do not need to fill in your birth date, sex, or level of education.^ 

Now look at the right half of your answer sheet Notice that the first fifh 
Items art arranged in columns in the top section of the answer sheet, while the 
next fift} items are arranged in the bottom section. Make sure you follow the 
order of the Items as the) are marked. For example, afier question number ten, 
yoo will need to rttom to the top of the section to mark your answer to question 
oomber eleven. 
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Are there qoestioDi?.'IY7 <o Ann*tr cveiy item, bat do aot be oonccnaetl if 
you can not Mpvtr ■U of Ibem. Yov will aot be paulizcd for fvcsilaf. If you 
are nnsure of the aacwtr lo a que«tl00j nuke the bc«t fuesi you can and so oo to 
the next qnestion. The verbttim tnotlatkm exam Utktt ipproxlmttel[y two hours 
and ten mfootci to complete. .-m^yitf 



4. Instruct the examinees to begin the Multiple Choice Section: 

5. Walk about the room to make sure that everyone is marking their answers 
correctly on the answer sheet. 



Now remove from yonr desk evuythlng except yonr test booklet, answer shett, 
pencils, and erasers^ 

Look at y-OMT test bookie', for the Multiple Choice Section of the Spanish into 
Entlish Verbatim Translation Exam. Print your name in the space provided on 
the c&ver. Print your last name firsts 

Print todi/s date in tht space provided... 

There are two parts in th»3 section. Vou wlU be allowed a toUl of thirtv.fi^e 
minutes to complete both paru. I will advise you when there art five minutes 
remaining. Vou may now open your test booklets and begin the test fSTART 
TINIER IM MEDIATELY! 



6. After 30 minutes , inform examines: 



There are five minute* remaining to complete this section. 
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7. After 3g minu.tC5. STOP AND RESET THE TIMER. Inform examinees: 



Thlt is the <sd of cfce MoIdpi« dioloe Stetkm. TkMsc ttop Worldiut sow. 
Now look over jrour aoswtr sheet careAiIty. Be suit all the miTrks jm made are 
dtrk and heavy. Insert yonr aaiwtr sheet in your test booUet and dose the 

booklet . 'i->-'y>.t^- 



8. Collect the test booklets and answer sheets ifor the Multiple Choice Section. Be 
sure to account for all test booklets distributed. 

9. Distribute the Words and Phrases in Context and Sentence Section booklets. 
Instruct the examinees to begin this section: 



There are two parts in the next section. Yon may not use your dictiooiry 
durini this section. Yon will be fiven 3S tntnotes to complete the two parts in 
this section, the Translation of Words and Phrases In Ctatcxt and Sentence 
Translation. I wlO advise you when there are five minnte* renutlnlnj to finish this 
section. You may now open yon r test booklets and be«in wortone. ISTART- 
TIMER IMMEDUTELY] 



10. After 30 minutes , infonn examinees: 



There are five minutes remaininj to complete this section. 



11. After 3^ minutes. STOP AND RESET THE TIMER. Infonn examinees: 



Please stop working now. We wf!! now have a short rest break. We will begin 
the Paragraph Trenslation Section in five minotf*. You may leave the room if 
you wish. 
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12. Collect the test booklets for the Words and Phrases in Context and Sentence 
Section. Be sure to account for all test booklets distributed. 

13. Distnbute the Paragraph Translation Section booklets, instrun the examinees to 
begin the Paragraph Translation Sc«ion: 



V/t will ae^y begin the Paragraph TmoclatloD SccHon. In Ibif MCttoo you will 
translate three fMragrapbs. Yon maj sse dictfonarict during tbli part of Ibe 
exam. You will have 4i mlnptM to complete the Pangnph Thinjlatton Section, 
i will infbrm you vhen there are five minutes rmalntng. When yon have finished 
this aection, pleuc dose yoor test bookleU and wait for ftulber Instructions. You 
m»y now begin. JSTART TIMER IMMEDIATELY] 



14. After 43 minutes, inform examinees: 



There are five minutes remaining. 



15. After 5 minutes , inform examinees: 



Please stop working now. Qose your test booklets 



Collect the test booklets for the Paragraph Translation Scctioi 



Test Admtntstritor Report Form 
SPANISH INTO ENGUSH VERBATEM TRANSLATION EXAM 



This form is to be used to report any irregularities in test administratxm. .Fkase fiil it out 
(even if there were no inegulariUes), sign your name, and return it with tbe |est materials. 
Thank you. 



Test Security 

By agreeing to serve as the test administrator, I urn responsible for ensuring the security of 
the test. I have kept the test materials confidentia] and secure at al) times. None of the test 
bookleu or test upcs has been reproduced in any form. 

Irregularities: ^ 



Test Administration 

The tests were administered in exact accordance with the procedures described in the 
Administration Manual. Any deviations from the stated procedures are listed bclov^-: 

liTCgulariiies: 



Condition of Test Malerials 

Before returning the test materials, I have checked the condition of the test booklets and 
test ta^^s. All materials are being returned in their original condition. 

Inegulariiics: 



(Please print name) 



Field Office 



Signature 



Date 



APPENDIX B 



MULTIPLE CHOICE SECTION TITLE PAGE AND 
INSTRUCTIONS 



NAME 



Last First 



SPANISH INTO ENGUSH VERBAHM TRANSLATION EXAM 

MULTIPLE CHOICE SECTION 
FORM 1 




This lest is for ofTicU! use onir, do not divulfe nnj inromatton coouined htrrin. 
Do no< duptkate tny portion of Ihli lot Do not show to oiuuthoriaed personi. 



nELD OFnCE 



TEST NO. 
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SPANISH INTO ENGUSH VERBATIM TRANSLATION EXAM (SEVTE) 
MULTIPLE CHOICE SECTION: INSTRUCTIONS AND EXAMPLE ITEMS 



EMBEDDED PHRASE ITEMS 

Instructions : Choose the best translation for the underlined portions of the following 
sentences. If there is more than one possible answer, choose the most appropriate 
translation. G)nsider how the entire sentence should be translated when choosing the 
correct answer. On your answer sheet, find the number of the question and blacken 
the space thai corresponds to the letter of the answer you have chosen. 

Example : Dicen que manana va a Hover . 



(A) 


to snow 


CB) 


to cry 


(C) 


to rain 


CD) 


to call 



Discussion: The translation of the full sentence is, Thev sav that tomorrow it^s going to 
EaiQ To rain is the correct translation of Hover : therefore, the answer is (C). You 
would black the space marked (C) on your answer sheet. 

ERROR DETECTION ITEMS 

Instructions Blacken the space corresponding to tlie letter of the incorrect pan of the 
sentence on your answer sheet. If there is no error, choose (D). There cannot be 
more than one error in each sentence. Possible errors include- incorrect grammar, 
word order, vocabulary, punctuation or spelling. 

Example : You shouldnt forget to call her tomorrow . 

A B C 

Di<;cu<;<;io^ Tht apostrophe has been omiiied from the contraction shouldn't , 
therefore, the correct choice is (A). You would blacken the space marked (A) on >our 
an.wer sheet. 
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PRODUCTION SECTION TITLE PAGE AND TEST INSTRUCTIONS 



NAME 



Last First 
DATE 



SPANISH INTO ENGLISH VERBATIM TRANSLATION EXAM 

PRODUCTION SECTION 
FORM I 




This tMt it for orOdtl use only. 60 not tfinilte •nj hrformalion contained herdn. 
Do not duplkau 8n> portion of IhU le$L Do BC4 thr*- unftutborized ptnom. 



HELD OFTICE 



TEST NO. 



SPANISH INTO ENGUSH VERBATIM TRANSLATION EXAM (SEVTE) 
PRODUCTION SECTION: INSTRUCTIONS AND EXAMPLE ITEMS 



TRANSLATION OF WORDS AND PHRASES IN CONTEXT 

Instructions : After you have read each of the following sentences, translate the 
underlined portion into English. Strive for a natural, grammatical rendition which 
doesn't modify the original meaning. Consider how the entire sentence would be 
translated before providing your answer. Use the spaces below each sentence. 

Example : Les he contado mucho de tj a mis padres. 

I have told 

Discussion : In this case, the pronoun l£s is not translated because the meaning is 
alreadv' contained in the translation of the full noun phrase of the indirect object: my 
parents . The translation of the complete sentence would be: I have told my parents a 
lot about you. li would not be correct in English to u-^c both the pronoun them and 
the noun phrase my parents in this sentence. 

SE\TE\CE TRANSLATION 

Instruction^; : After you have read the foUouing sentences, translate them into Engbsh 
Use the spaces pro\'ided. Make sure your rendition sounds natural in Enghsh while 
retaining the onginal meaning. 

E>L?nnj2le: Los poises en vnas de desarroUo necesiian la ayuda de las n?ciones 
uidustrializadas. 

Developing countries need the assistance of industriali2ed nations. 

Discussion - Note that developing countries is an appropriate translation of the 
idiomatic expression paises en vias de dcsarrollo . A more literal translation (i.e., 
countries on the road to development ^ would not sound natural in English. Note also 
that the definite anicle ihe is not used in the English translation of cither plural noun 
phrase (i.e., developing nations and industrialized nations V AdditionalK, the placemen: 
of the adjective industrialized is in front of the noun in English. 

PARAGRAPH TRAVSLATIOV 

Instruction s Translate the following pa/agraphs into English. Again, strive for a 
natural rendition without changing the original meaning. You are permitted to use u 
dictionary during this section onK. Do not return to work on previous sectjon.s 
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APPENDIX D 



CONTENT ANALYSIS OF SEVTE MULTIPLE CHOICE SECTIONS 



OONTSNT ANALYSIS 



SPANISH-ENGLISH (EXAM I) 

1. vocabulary • advtrbial phrase 

2. vocabulary • idiOB (conplete phrase) 

3. vocabulary • adverb 

4. gra&sar • use of subjunctive 

5. vocabulary - conjunctiori 

6. vocabulary - verb phrase 

7. vocabulary - adverbial phrase 

8. vocabulary • adverbial phrase 

9. vocabulary • verb phrase 

10. vocabulary • false cognate (verb) 

11. a. vocabulary - verb phrase 

b. grammar - use of subjunctive 

12. vocabulary • false cognate (verb) 

13. vocabulary • false cognate (verb) 

14. vocabulary - verb phrase 

15. vocabulary - false cognate (adjective) 

16. a. vocabulary - verb 

b. grammar ^ use of subjunctive 

17. vocabulary • adverb 

18. vocabulary - false cognate (adverb) 

19. vocabulary - adverbial phrase 

20. vocabulary - noun phrase 

21. vocabulary - verb phrase 

22. vocabulary - noun 

23. vocabulary - adjective 

24. vocabulary - false cognate (noun phrase) 

25. vocabulary - false cognate (noun) 

26. vocabulary • proverb 

27. vocabulary - false cognate (verb) 

28. vocabulary • noun 

29. grammar ~ use of subjunctive 

30. vocabulary • verb phidse 

31. vocabulary - verb phrase 

32. vocabulary • verb phrase 

33. vocabulary - verb phrase 

34. vocabulary • noun phrase 

35. vocabulary • verb 

36. spelling 

37. grammar • verb form (past participle) 

38. grammar - subject -verb agreement 

39. grammar - verb form 

40. grammar - verb form 

41. spelling 

42. grammar - use of pronoun (subject-verb agreement with 
pronoun) 

43. vocabulary - false cognate (adjective) 

44 . No error 

45. vocabulary - false cognate (noun) 
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46. 
47, 
48. 

49. 
50, 
51. 
52. 
53. 
54. 
55. 
56c 
57. 
58. 
59* 
60. 



grammar - subject-verb agreement 
No error 

grammar • verb form (use of infinitive vs. present 
participle) 

punctuation - use of apostrophe 
punctuation ^ comma 
No error 
spelling 

grammar - use of pronoun (inconsistency) 
grammar - use of pronoun (pronoun-noun agreement) 
grammar - use of pronoun (subjective vs. objective) 
grammar - use of pronoun (relative - vho/whom) 
grammar - adjective-noun agreement (less/fewer) 
grammar - use of pronoun (relative - vho/which) 
vocabulary - conjunction 
grammar - (lie/lay) 



GRAMMAR is tested: 
verb form: 
use of subjunctive: 
subject/verb agreement: 
use of pronouns: 
adjective/noun agreemtent: 
lie vs. lay 

VOCABULARY is tested: 

adjective or adjectival phrase: 

adverb or adverbial phrase: 

noun or noun phrase: 

verb or verb phrase: 

proverb: 

conjunction: 

idiom: 

PUNCTUATION is tested: 
SPELLING is tested: 

NO ERROR appears: 



18 


tises 






4 


tines 






4 


tines 






2 


tines 






6 


tines 






1 


tine 






1 


tine 






36 


tines 






3 


tines 


(2 


FC) 


7 


tines 


(1 


FC) 


7 


tines 


(3 


FC) 


15 


times 


(4 


FC) 


1 


tine 






2 


tines 






1 


tine 






2 


tines 






3 


tines 






3 


tines 
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CONTENT ANALYSIS 
SPANISH-ENGLISH (EXAM II) 

1. vocabulary - adverbial phrase 

2. vocabulary - idioa (coaplete phrase) 
3* vocabulary • adverbial phrase 

4- grammar • use of subjunctive 

5. vocabulary - conjunction 

6. vocabulary - verb phrase 

7. vocabulary - adverbial phrase 

8. vocabul^.y • adverbial phrase 

9. vocabulary - adverbial phrase 

10. vocabulary - false cognate (verb) 

11. a« vocabulary • verb 

b. grammar ^ use of subjunctive 

12. vocabulary • verb 

13. a. vocabulary - false cognate (verb) 
b. grammar - use of preposition 

14. vocabulary • verb phrase 

15. vocabulary - adjective phrase 

16. a. vocabulary verb 

b. grammar • use of subjunctive 

17 . vocabulary - adverb 

18. vocabulary - false cognate (noun) 

19. vocabulary - adverb phrase 

20. vocabulary - noun phrase 

21. vocabulary - verb phrase 

22 . vocabulary - noun 

23. vocabulary - adjective 
54 . vocabulary - verb phrase 

25. vocabulary - false cognate (noun phrase) 

26. vocabulary - proverb 

27. vocabulary - false cognate (verb phrase) 

28. vocabulary - idiom (complete phrase) 

29. grammar - use of subjunctive 

30. vocabulary - verb phrase 

31. vocabulary - verb phrase 

32. vocabulary - verb phrase 

33. vocabulary - verb phrase 

34. vocabulary - noun phrase 

35. vocabulary - verb 

36. spelling 

37. grammar • past participle 

38. grammar - subject-verb agreement 

39. grammar - verb form 

40. grammar - verb form 

41. spelling 

42. grammar - subject-verb agreement with pronoun 

43. vocabulary - false cognate (noun) 

44. No error 

45. vocabulary - false cognate (noun) 
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46. granunar - Eubject-verb agreenent 

47. No error 

48. graaaar • verb form (infinitive vs. present participle) 
49* punctuation - use of apostrophe 

50. punctuation - comma 

51. No error 

52 . spelling 

53. grammar - use of pronoun (inconsistency) 

54. grammar - use of pronoun (pronoun-noun agreement) 

55. grammar - use of pronoun (subjective-objective) 

56. grammar - use of pronoun (relative - who/whom) 

57. grammar - noun-adjective agreement (less/fewer) 

58. grammar - use of pronoun (relative - who/which) 

59. vocabulary - conjunction 

60. grammar - lie vs. lay 



GRAMMAR is tested: 
verb form: 
use of subjunctive: 
subject/verb agreement: 
use of pronouns: 
adjective/noun agreemtent: 
lie vs. lay 
use of prepositions: 

VOCABULARY is tested: 

adjective or adjectival phrase: 

adverb or adverbial phrase: 

noun or noun phrase: 

verb or verb phrase: 

proverb: 

conjunction: 

idiom: 

PUNCTUATION is tested: 
SPELLING is tested: 

NO ERROR appears: 



19 times 

4 times 

4 times 

2 times 

6 times 
1 time 

1 time 

1 time 

36 times 

2 times 

7 times 

7 times (4 FC) 

15 times (3 FC) 

1 time 

2 times 
2 times 

2 times 

3 times 

3 times 
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APPENDIX E 



SENTENCE ACCURACY SCORING GUIDEUNES 
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FINAL VERSION 



SENTENCE ACXmACY SCORING GUIDELINES 

0 Translation is less than 50% complete. 

1 Many mistranslations, omissions, and/or inappropriate additions, $o that much of the 
meaning is lost. 

2 Mistranslation or omission of one or more key terms (including verb tense), and/or 
inappropriate additions. 

3 Mistranslation or omission of one or more minor terms; no inappropriate additions. 

4 No mistranslations or omissions, although some nuance may not be conveyed. 

5 AJl nuances conveyed. 
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APPENDIX F 



PARAGRAPH SCORING GUIDEUNES 
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FINAL VERSION 



SEVTE PARAGRAPH SCORING GUIDELINES 



GRAMMAR S (Structure and Morphology) 

0 (Translation less than 50% complete.) 

1 Majority of structures arc incorrect. 

2 Some errors in basic structures and numerous errors in complex structures. 

3 Errors in basic structures are rare. Sporadic errors in high frequency complex 
structures; some errors in low frequency complex structures. 

4 No more than one error in a complex structure. 

5 No grammar errors. 

EXPRESSION (Word Order, Vocabulary, Idiomaticity, Style, and Tone) 

0 (Translation less than 50% complete.) 

1 Expression generally equivalent to source language; unacceptable in target 
language. 

2 Expression closer to source language; generally unacceptable in target language. 

3 Expression usually follows target language conventions, but is not always 
prefened. 

4 Expression occasionally reveals translation. Appropriate register. 

5 No evidence of translation. 

MECHANICS (Spellmg. Punctuation, and Capitalization) 

0 (Translation less than 509c complete.) 

1 Numerous enors in spelling or punctuation. 

2 Frequent errors in spelling or punctuation. 

3 Occasional enors in spelling or punctuation. 

4 Rarely makes errors in spellmg or punctuation. 

5 AJmost no enors in spellmg or punctuation. 

ACCURACY 

0 (Translation less than 50% complete or less than 50% accurate.) 

1 Many mistranslations, omissions, and/or inappropriate additions, so that much of 
the meaning is lost. 

2 Mistranslation or omission of one or more key terms (inciudmc verb tenser), 
and/or mappropriate additions. 

3 Mistranslation or omission of one or more minor terms; no inappropriate 
additions. 

4 No mistranslations or omissions, although some nuance may not be conve\cru 

5 Ail nuances conveyed. 

• Use the information on the follov^ing page as a guide in distinguishing ern>r^ m 
basic, high frequency complex, and low frequency complex structures. 
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1) BASIC STRUCTURES: (subjcctA^crb agreement, number (plural, singular], present 
tense, present progressive, simple past, pronouns, comparative^' going to futurc/'s 
possessives, present tense modals [can, will, shall, may, might, roust]) 



2) HIGH FREQUENCY COMPLEX STRUCTURES: (articles, present perfect, past 
perfect, past progressive, past modals (could, would), perfect modals (must, could, 
might, may + have], used to . derivational endings (noun, adjective, adverb, verb 
endings], relative clause pronouns, tense sequencing, prepositions) 



3) LOW FREQUENCY COMPLEX STRUCTURES: (gerunds vs. infinitives, 
subjunctive, conditional tense, future perfect, compound tenses [past perfect progressive, 
f"*urc perfect progressive, etc.], two word verbs (take over, take on, take up, etc.]) 



J48 



APPENDIX G 
PILOT VERSION OF SENTENCE SCORING GRID 
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PILOT VERSION 



SENTENCE SCORING GRID 

GRAMMAR 

0 Less than 50% complete. 

1 One or more errors in basic structures. 

2 One or more errors in high frequency complex struaures. 

3 One or more errors in low frequency complex structures. 

4 One error in a very low frequency complex structure. 

5 No errors. 



EXPRESSION 



0 Less than 50% complete. 



Expression generally equivalent to source language; unacceptable in target languacc 
Expression closer to source language; generally unacceptable in target languace 
txpression folios*? target language conventions, but is not preferred. 
Expression gives subJle indication of translation. Appropriate register. 



5 No evidence of translation. 



M ECHANICS 



0 Less than 509c complete 

1 Four errors 

2 Three errors 

3 Two errors 

4 One error 

5 No error 



ACCURACY 

0 l£.^ -n 50% complete. 



1 
2 

3 
4 
5 



Many mistranslations, omissions, and/or inappropriate additions 
Mistranslation or omission of one or more ke> terms (incluJinc verb tense) a- j or 
mappropnate additions. " 

Mistranslation or omission of one or more minor terms; no inappropriate additions 
No mistranslations or omissions, although some nuance may not be conveved 
All nuances conveyed 
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APPENDIX H 



PILOT VERSION OF PARAGRAPH SCORING GRID 
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PILOT VERSION 

FArAGRAPH SCORING GRID 

GRAMMAR 



0 Less than 50% complete. 



Majority of stnictures are incorrea. 

2 Some errors in oasic structures and numerous errors in complex stnictures 

3 Errors in basic struaures are rare. Sporadic errors in high frequency complex structures- 
some errors in low frequency complex struaures. 
No more than one error in a low frequency complex structure. 



5 No grammar errors. 



EXPRESSION 



0 Less than 50% complete. 

1 Expression generally equK-alent to source language; unacceptable in target laneuace 

2 Expression closer to source language; generally unacceptable in target language 

3 Expression usually follows target language conventions, but is not always preferred 

4 Expression occasionally reveals translation. Appropriate register 

5 No evidence of translation 



MECHANICS 



0 Less than 50% complete 

1 At least 50% correct 

2 At least 70% corrca 

3 At least 80% correct 

4 At least 90% correct 

5 At least 99% correct 



ACCURACY 

0 Less than 50% complete. 



Many mistranslations, omissions, and/or inappropnate additions 

2 Mistranslation or omission of one or more key terms fr j.ng verb tense) and o- 
inappropnate additions. 

3 Mistranslation or omission of one or more minor ... no mappropriate additions 

4 No mistranslations or omissions, although some nuance ma) not be conveyed 
All nuances conveyed ^ 



152 



APPENDIX I 



FBI/CAL TRANSLATION SKILL LEVEL DESCRIPTIONS 
AND QUESTIONNAIRE 
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July 26^ 1990 

7BI/CAL TRAH8IATX0M 6KILL LEVEL DBSCiaZPTIOVS 



gXPRBBSIOK 

0+ Makes very frequent mistakes in spelling, punctuation, and 
representation of symbols. Uses none or almost none of the 
morphology or syntax conventions of the target language. Vocabulary 
is extremely limited and frequently inappropriate, even when using a 
dictionary. Only very simple sentences are correct. Style and tone 
are not identifiable. Renders a translation that appears very 
distorted and for the most part is unintelligible. 

1 Makes frequent spelling and punctuation errors, frequent grammar 
errors in basic structures, and shows little ability to convey verb 
tenses other than the present tense. Syntax is generally equivalent 
to that of source language. Vocabulary is o^:ten inappropriate, even 
when using a dictionary, and active vocabulary S'd usually limited to 
everyday words and cognates. Renders an extremely literal 
translation, i.e. almost word by word. Has no ability to deal with 
complex sentence patterns. Unable to convey style and tone, unless 
their use in source document is very predictable. Portions of the 
translation are unintelligible and others are clearly distorted; 
however, much of it can be understood by native readers used to 
dealing with foreigners' efforts to translate their language. 

1+ Makes many spelling errors and punctuates according to source language 
conventions. Makes many errors in basic grammatical structures, and 
uses very few low frequency constructions correctly. Uses syntax 
that is very close to that of source language, while vocabulary is 
limited and makes many errors in choice of words, sometimes even when 
using a dictionary. Attempts at complex sentences often result in 
errors* Uses uneven style and tone that do not reflect those of 
original document. This person *s translated documents appear 
distorted but are mostly intelligible to native readers used to 
dealing with foreigners* efforts to translate their language. 

2 Makes spelling errors, while capitalization and punctuation errors 
reflect source language conventions. Uses syntax that is closer to 
source language than to target language. Makes very frequent errors 
in low frequency grammatical structures, frecjuent errors in high 
frequency grammatical structures, and some errors in basic structures. 
Vocabulary may be generally too limited to convey abstract thoughts. 
Has only some knowledge of idiomatic expressions and colloquialisms, 
and very limited knowledge of sayings and proverbs. Distorts the 
style and/or the tone of the original document and may inappropriately 
combine use of formal and informal patterns of speech. Produces 
translations that are very literal, but are generally understandable 
to a native reader NOT used to dealing with foreigners* efforts to 
translate their language. 
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2-f Makes some spelling errors, and may use capitalization and punctuation 
that imitates usage of source language* Uses syntax that tends to 
reflect that of source language. May sake frequent errors in low 
frequency complex granaatical structures, sobs errors in high 
frequency cosplex structures, end occasional errors in basic 
structures. Has little ability to use complex sentence patterns. 
Vocabulary is adecjuate to express some abstract thoughts; can often 
make sensible guesses about unfamiliar words using linguistic context 
and prior knowledge. Has a fair knowledge of idiomatic expressions 
and colloquialisms and only limited knowledge of sayings and proverbs. 
Tone and style are uneven and somewhat distorted. Produces documents 
that are readily understandable but clearly have been translated. 

3 Occasionally makes spelling mistakes, some grammar mistakes in low 
frequency complex structures, sporadic errors in high frequency 
complex structures, and shows no pattern of errors in basic structure. 
Uses punctuation that is almost identical to source document, i.e. 
sometimes atypical of the target language. Moderately good ability 
to join or divide original sentences as required by target language 
constructions, while still retaining the meaning of the source 
document. Moderately good ability to use complex ^"^ructures, sentence 
patterns, and vocabulary appropriate for expressing abstract thoughts. 
Moderately good knowledge of idiomatic expressions and colloquialisms, 
and some sayings and proverbs, but with occasional misunderstandings. 
Uses a number of syntactic constructions that are more characteristic 
of source language than target language, thereby producing ticcuments 
that appear to be a translation. This person's style and tone are 
even, but occasionally differ slightly from original. 

34> Makes occasional spelling and punctuation errors. Occasionally makes 
grammatical errors in low frequency complex structures, sporadic 
errors in high frequency complex structures. Good ability to use very 
complex sentence structures. Uses some syntactic structures that are 
more typical of source than target language which suggest that the 
document is translated. Vocabulary is generally extensive but usage 
is not always precise given the context, especially in the use of 
register and colloquialisms. The style and tone of the original 
document are not always retained. 

4 This person *s errors of grammar are very rare and unpatterned. This 
person rarely makes a spelling or punctuation error. Uses some 
syntactic structures that suggest the document is a translation — while 
these are grammatically correct, they are not typical of the target 
language. Very good ability to use highly complex sentence 
structures. Very good knowledge of idiomatic expressions, register, 
colloquialisms, sayings and proverbs and their equivalents in the 
target language. However, a document rendered by this person may 
occasionally reveal itself to be a translation due to atypical use of 
syntax and vocabulary. The style and tone are equivalent to those of 
the source document. 
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Makes no grammatical or punctuation errors, and no spelling errors 
that would not be made by an educated native %rriter of the target 
language. There ere minor problems of syntax, spelling, or 
vocabulary; which although graMatieally correct are not typical of 
the source language and suggest that the document is a translation. 
These and other infelicities could only be confirmed by en educated 
native reader of both languages who compares the documents in both 
the source language and the target language. Uses style and tone that 
are a true reflection of source document. 

produces work that contains no grammar, spelling or punctuation errors 
that would not be made by other well«»educated native %nriters. Can 
produce documents whose syntax is that of the target language, with 
no influence of source language. Can adapt rhetorical structures so 
that the document reads as if it had originally been %n:itten in the 
target language. can convey all nuances and can use tone and 
stylistic devices that are identical in effect to those of original, 
including use of humor. 



ACCPRACY 



0+ Has no real ability to translate connected discourse. Efforts to 
translate contain »any mistranslations and oaissions, and very little 
information from source document is conveyed. 

1 Renders translations whose accuracy is deficient, with frequent 
mistranslations and omissions and may make inappropriate additions* 
Much of the information from longer source documents is lost. 

1+ Produces translations ^wose accuracy is inadequate, containing many 
mistranslations or omissions, and possibly additions. Almost all 
nuances are lost. 

2 Produces translations whose accuracy is mostly adequate and without 
severe substantive omissions, but without many nuances, and with quite 
a few mistranslations. May include some additions for clarification 
of areas the translator can not accurately convey. 

2+ Produces translations whose accuracy is adequate, but contain some 
mistranslations or omissions, and reflect a limited ability to convey 
nuances. 

3 Produces translations whose accuracy is good, with occasional minor 
mistranslations or omissions. Can handle clearly identifiable 
nuances. 

3+ Produces translations whose accuracy is very good; there are 
occasional omissions, or sporadic minor mistranslations; nuances and 
subtleties are not always conveyed exactly or not at all. 

4 Renders translations whose accuracy is excellent; almost all nuances 
are conveyed and there are no mistranslations. 

4^ Can produce documents that are totally accurate, convey all nuances, 
and are devoid of mistranslations or omissions. 

5 Can produce translations that are an exact reflection of the source 
document in all aspects , even translating difficult and abstract 
prose. Can produce work that is totally accurate, with no 
mistranslations or omissions. 
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Interpretive infonation 
T-0 HO PSOriCZBMCY 

Ho ability to translate the language. 



T-O^ XEMORIZED PROPICIEHCY 

Able to translate using only Meaorixed Material and expressions, 
such as nuttbers, dates # addresses, sone street signs and shop 
designations. 



BLENENTARY PROPICIBNCY 
(Base Level) 



Able to translate very simple documents in printed or typed form 
at the survival level such as simple messages and simple notes 
conveying basic instructions. 



T-l^ KLKM£Nl*ARy PROFICIENCY 

(Higher Level) 



Able to translate simple documents in printed or typed form 
dealing with survival needs and routine social demands such as 
simple letters and biographical data. 



T«-2 LIMITED WORKING PROFICIENCY 

(Base Level) 

Able to produce understandable translations of simple documents 
pertaining to routine social and business correspondence and areas 
of professional experience. 



LIMITED WORKING PROFICIENCY 
(Higher Level) 

Able to translate vith some precision most factual, nontechnical 
prose as well as some documents on concrete topics related to 
fields in which he or she has an interest or background. 
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T-S GKHERAL PK0PB88Z0MXL PROFICIEVCY 

Abl« to translate acoaptablj aost formal and infonal vrittan 

axchangaa on praotioal/ aooial mnA profaasional topioa. 

Daaonatrctsa an aaarging ability to tranalata tfiTaraa aubjact 
stattar. 



GEKBRXL PROFESSIONAL PROFICIEKCY 
(Highar Laval) 

Abla to tranalata af factivaly a variaty of docusianta daaling vith 
divaraa axibjact nattar vithin tha aoopa of paraonal or profaaaional 
axparianca. 



T*4 ADVANCED PROFESSIONAL PROFICIENCY 

(Baaa Laval) 

Abla to tranalata vary affactivaly all forma of documanta within 
tha acopa of paraonal and profaaaional axparianca, can handla otbar 
docxmenta adequataly. 



GENBRAI, PROFESSIONAL PROFICIENCY 
(Higher Laval) 

Approzimataa a mastar tranalator^a ability to produca 
translationa that ara an axact raflaction of tha original docvuaant* 



T-5 (Master Translator Proficiancy) 



Proficiancy aquivalant to that of a wall-aducatad maatar 
translator. Abla to translata avan difficult and abstract proaa; 
for axampla, ganaral tachnical and lagal taxta ae vail as highly 
colloq[uial writing* 
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Paragraph Scoring Grid 



CramMr 



1 

(1.0) 



l4 



fm Manytfvan 
f ril l M i Mi wycloMli 



MMyMMsU. MiMyflMaMlA* fVaquwiMBlrai 

Mt«40MII* IIOMaitfCMM. l«iOM&«^v^M 



tn^ljr frees ongviri. reAKfonfmri. 
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KXHIBIT A 



I 



2 

(2j0) 



2^ 



3 

(3.0) 



34 
(3J) 



4 

(4i)) 



Cmwmi 




biiKtMrri 



CtiiiC(iiii#liih 



4^ 
(4J) 



S 

(5jD) 

mmm.Wm4m* 
4riiQlM«li«. 



Cmwm 



m5% 



VartitftliiMy 



daMonorf. IMvvcn. o^mdunevan. fiMior^Ml frMsngmal. 



MyB4 



ongiML 



9Mn 



Fxhihit A 



« 
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EXHIBIT B 



QUBSTIOVUAZU OM TRA1I8LATI0M SKILL LKVBL8 



Please read the attached infomation on translation skill levels. 
We ask that you exaaine the criteria, descriptions^ and scoring 
grid in light of your experience with translation. Your coaxnents 
on this aaterial will help us to develop an accurate test of 
translation ability. If you require Bore space than is provided 
after each question, please continue your responses on the back. 

Section A. Criteria 

1. What relationship do you see between ILR reading/writing level 
and translation skill level? Do you agree with the assessoent of 
the relationship described in the criteria? 



2. Do you agree with the description of a "perfect" translation? 
Why or why not? 



3. Are there variables other than those presented that you would 
consider in evaluating translation ability? Do you consider any 
of the variables presented to be uniaportant? 



1 
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Please read through each skill level description and note any 
comments regarding a particular description in your responses to 
the questions below. Be sure to indicate the sJcill level 
description and the line within that description that your comment 
applies to. 

1. Do you thinX any of the characteristics we have included in 
Level 0-5 is inappropriate to that level? If so, which? 



2. Where would you add other characteristics? 



3. Would you delete any characteristics from the descriptions? 



Er|c J ^3 



4. Are there unclear areas in any of the descriptions? 
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Do you agree with the description of a Master Translator? 



6. What would you add to, change, or delete froB this description 
(T-5)? ^ 



Section C . Scoring Grid 

The attached grid is designed to aid scorers in Baking a decision 
about the appropriate skill level description to assign. Please 
comment on the grid. 

1. Would you find this grid helpful in evaluating a translation 
test? 



1^4 



2. Where would you sake changes to the grid? 



3. What would you add to the grid? 



4. Do you agree with the percentages listed for spelling and 
punctuation accuracy? If not, what percentagas would you 
substitute? 



We would welcome any additional comments you uight have. Please 
use the rest of this page or an additional sheet to comment on any 
aspect of this material. Thank you for your valuable assistance 
m developing criteria for rating tests of translation ability. 

Sincerely, 

Charles Stansfield 
Marijke Walker 
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BACKGROUND PROFICIENCY QUESTIONNAIRE 
GIVEN BEFORE TRIALING 



Name : 
Date: 
Test: 



Thank you very auch for agreeing to take part in the trial ing of 
the Spanish into English Verbatia Translation Exaas. Your coanents 
about these exams are very iaportant to us. We would like you to 
fill out these fonas after you have conpleted each version of the 
exam. Please be as clear and frank as possible. 

"^^f conpleting sach aection has not yet been 

established but we would like you to work as quickly and accurately 
as you can (as if it were a tiaed cxaa) . Please record the time 
needed to coaplete each section on these foras. This will enable 
us to establish the coapletion tiaes for future exaainees. 

You are not permitted to use a dictionary on any part of this exair, 
except for the last section which is entitled -Production Section 
III. You are also not peraitted to receive or give any assistance 
regarding these exams. Your cooperation in these aatters is 
greatly appreciated. 



Ho; 



.• do you rate your overall Spanish ability? 



How do you rate your overall English ability? 
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EXAM FEEDBACK QUESTIONNAIRE 
MULTIPLE CHOICE AND PRODUCTION SECTIONS 
(TRIALING VERSION) 
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Multiple Choice Section I Completion time: hrs. minutes 

I) How could the directions be made clearer? 



2) How should questions be modified, if any, so that they are Jess misleading/confusing? 

3) U'hich questions, if any, do you feel should be deleted? 

4) Which questions, if any, do you feel should be added? 

5) VMiat unintended errors, if any, did you find in this section? 

6) Did this section adequatcK test your knowledge of English? 
7; Did this section adequately test your knowledge of Spanish? 

8) Were any major points not tested that you feci should hu\c been' 

9; Did you feel that this section was too long / too shon / just right? 
lOj An> additional comments? (Continue on the back, if necessary!!) 
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Multiple Choice Seaion II Cdmpletion time: hrs. ^minutci 

_ - — - — - Tiwv 



2) HoH should questions be modified, if any, so that they are less misleading/confusing 

3) Which questions, if any, do you feel should be deleted? 

4) UTiich questions, if any, do you feel should be added? 

5) What unintended errors, if any, did you find in this section? 

6) Did this section adequate^ test your knowledge of English? 

7) Did this section adequately test your knowledge of Spanish? 

B) Were an> major points not tested that you feel should have been? 

9) Did you feci that this section was: too long / too short / just right? 
10; An> additional comments? (Continue on the back, if necessary?!) 
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Production Se«ion I Corapletion time: hrs. g ninutes 

1 ) How could the directions be inade clearer? 

2) How should questions be modified, if any. so that they arc less mislcading/confusing 

3) Which questions, if any, do you feci should be deleted? 

4) Which questions, if any, do you feci should be added? 

5) What unintended crroi^. if any. did you find in this scaion? 

6) Did this section adequately test your knowledge of English? 
7j Did this section adequately test your knowledge of Spanish? 

8; Were any major points not tested that you feel should have been? 

9j Did you feel that this section wa<- too long / too shon / just righ:'' 
10; Aji> additional comments? (Continue on the back, if necessary!!) 



in 



Production SccUon II Completion time: hrs. minutes 

1) How could the directions be made ctearcf? 



2) How should questions be modi6ed, if any. so that they are less misleading/confusing 

3) V^'hich questions, if any, do you feel should be deleted? 

4) y^'hKh questions, if any, do you feel should be added? 

5) What unintended errors, if any, did you find in this section? 

6) Did this section adequately test your knowledge of English? 
7j Did this section adcuatcly test your knowledge of Spanish? 

8) Were any major points not tested that you feel should have been? 

9) Did you fed that this section was' ick) long / too shon / just right:* 
10; An> additional comments? (Continue on the back, if neccssar>II) 
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APPENDIX L 



SEVTE EXAM FEEDBACK QUESTIONNAIRE 
(VALIDATION STUDY) 
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SPANISH INTO ENGUSH VERBATIM EXAM QUESTIONNAIRE 



We would very much apprtdaU your answers to the fbOowmg brief questions eoncmting the 
verbatim translation exams you have Just taken: \ 

1. Was the length of time given for completing tlte muUiple choice sections about right? 

( ) Too shon 
( ) About right 
( ) Too long 

2 Was the length oj nme given for completing the production sections about right? 

( ) Too shon 
( ) About right 
( ) Too long 



Please indicate to what extent you agree or disagree with the following staterrients: 

3. The directions were clear 

( ) Agree ( ) Disagree 

4. The material in the exams was representative of the types of written documents I might 
encounter in my worL 

( ) Strongly ag^ee ( ) Agree ( ) Disagree ( ) Strongly disagree 

5. There was sufficient opportunity for me to demonstrate my ability to translate from 
Spanish into English, 

( ) Strongly agree ( ) Agree ( ) Disagree ( ) Strongly disagree 



Thank you for your cooperation. 



374 



APPENDIX M 



PILOT QUESTIONNAIRE AND RESULTS 

ON 

LANGUAGE BACKGROUND AND PROFICIENCY 



)75 
i ERJC 



Thank you for agrooi&g to assist us in otrsltistl&g tbsss tosts. 
Wo roquost that you ooaploto tho foXloviag ii^formation to aid in 
our analysis. 

Maao: 



Profossion: 

Studont 

Course of Study: 



Translator 

Taachar 

Other (please specify) 

native Language: 

English 

Spanish 

Other (please specify) 

low vould you rate yoxxr ability to write in Inglisb? 

Excellent 

Very good 

Good 

Fair 

Poor 

How would you rate your ability to speaJc in tnglisb? 

Excellent 

Very good 

Good 

Fair 

Poor 

How wouid you rate your ability to write in apanisb? 

Excellent 

Very good 

Good 

^air 

Poor 

Hov would jov rate your ability to speaJt in Spanish? 

Excellent 

Very good 

Good 

Fair 

Po-^r 



Bachelor's in Spanish 
Master U in Spanish 
Translation Certificate Program 
Other (Please specify) 



QUESTIONNAIRE RESULTS 



UNDERGRADUATES 

Total R»ttpond»ntsi 45 

English: 36 

Bilingual 
Eng-Span: i 



All data Mlf -reported 



Spanifths 0 
Others 6 



EnqU^h Writino Afr^jj^y; 



Esc«ll#nt: 
Very good: 
6ood: 
Fai r J 

Poor: 



22 
16 
6 
1 

0 



Enoliah Si;>t^ir^rfl Ability! 

Excellentt 29 

Very goods 15 

Goods o 

Fairs i 

Poors 0 



Sp^^nish Anting Ability 



Excellent : 
Very good: 
Good: 

Poor : 



1 
9 

20 
12 
3 



Soanimh SDga^-j^ g Ability ^ 

Excel lent: 2 

Very good: fe 

Good: ig 

Fair: le 

Poors 3 



BRADUATE STUDENTS 

Total Respondents: 

^tive LanQudSQf** 

Engiish: 3 

Bi 1 1 ngud 1 
Enc-Span: 0 
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Spanish: 



Other: 



All data self -reported 



E"qllS»- Writin p Apllityi 

Excellent: 1 
Very good: 6 

Fai r : 0 

Poor: Q 



English So»^t<;j[n B Abilit 



Excellent! 
Very goods 

Good: 
Fai r : 

Poor: 



3 
4 

3 
0 



f 

i 



8P»ni#h Writing Abilitvt 



Exc«ll»nts 4 

Vtry good: 3 

6oodi 1 

Fain 2 

Poort 0 



finanimh fia^Akinn AK«i«a. 

Excel l»nti S 

Very goodt 2 

eopda 2 

Fairs ^ S 

Poors O 
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SELF-ASSESSMENT QUESTIONNAIRE 
AND 

SUMMARY REPORT ON SELF-ASSESSMENT 
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NAME 



FIELD OFFICE 



SELF-ASSESSMENT OF TRANSLATION AMLTIY 

Tbe puipaic of ihis quciUonnaire b to iaira your cindid eviluaiioo of your ability to translate wii'm 
docuineDU from SPANISH INTO ENCUSH. It b of ibe otmott ixnpomoce that yo« provide aa booest 
evaluation of your present abilities so that the effectWeoess of the transbtioo exams may be axarately and fully 
assessed. Pluse be assured that your responses wiU be kept ooondential by the test development contraaor and 
will in no way afTen your sundin| or possibility of advancement within the Bureau. 

In^tmctlons; Please estimate your ability to translate the following types of docpmems ming the scale provided 
below: 

Limited The translated document contains many mistranslations and omissions, and frequent errors in 
frammar. The translation is extremely literal (It M)rd for mord) and may be difficult to 
understand. 

Functional The translation is fairly accurate with no substantive omissions; however, it may conuin some 
mistranslations and grammar errors. The translation is literal but jcncrally undersundablc. 

Competent The accuracy of the translated document is good, with ocxasional min mistranslations and 
omissions. There is no pattern of grammar errors. Most idiomatic expressions are used 
appropriately, however, the phrasing may reveal the document to be a translation. 

Superior The accuracy of the translation is cxccUeni, with most nuances conveyed. Grammar errors are rare. 
The phrasing is entirely natural and the document does not appear :o be a translation. 

Please evaluate candidiv your ability to translate each of the following ty-pcs of documents from Spanish into 
English by circUug the appropriate label If you have never translated a particular type of document, please 
mark N7A fnot applicable*). 



1. 


Ncu-spapcr aniclcs 


Limited 


Functional 


Competent 


Superior 


N.A 


I 


Ncu-spapcf editorials 


Limited 


Functional 


Competent 


Superior 


N/A 


3 


Depositions 


Limited 


Functional 


Competent 


Superior 


N/A 


4 


Police reports 


Limited 


Functional 


Competent 


Superior 


N/A 


5 


Correspondence 


Limited 


Functional 


Competent 


Superior 


N/A 


6 


Legal documenis 


Limited 


Functional 


Competent 


Superior 


KA 


7. 


Letters rogator\ 


Limited 


Functional 


Competent 


Superior 


N A 


8 


Case histories 


Limited 


Functional 


Competent 


Superior 


N A 


9 


FCI status 'evaluation reports 


Limited 


Functional 


Competent 


Superior 


N A 


10 


Saeniific/tcchnical aniclcs 


Limited 


Functional 


Competent 


Superior 


N A 


n 


Foreign diplomatic reports 


Limited 


Functional 


Competent 


Superior 


N A 


12 


Training manuals 


Limited 


Functional 


Compcietit 


Superior 


N A 


13 




Limited 


Functional 


Competent 


Superior 


N A 
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SUMMARY REPORT ON SELF-ASSESSMENT: SPANISH TO ENGLISH 

The following section is an analysis of the results of the 
Spanish-to-English Self -Assessment Questionnaire that was 
completed by FBI personnel participating in the validation study. 

This section specifies: 

1. the document types which the participants checked most 
frequently; 

2. the average rating for each document type; 

3. the per cent of the total respondents who gave a 
response for each document type; 

4. the document types which correlated most significantly 
with the FBI translation skill level descriptions. 

AVERAGE RATING OF EACH DOCUMENT TYPE 

The questionnaire required the employee to rate his or her 
ability to translate each document type on a four point scale. 
The options on the scale were: 4, superior; 3, competent; 2, 
functional; and 1, limited o The documents listed below were 
included. In addition, there were 43 respondents to the Spanish- 
to-English self -assessment que::tionnaire. The table below gives 
the percent who responded to each document type, and the average 
rating, ranked in descending order. 

DOCTYPE % RESPNDING AVERAGE 

SELF-RATING 



SECORRES (correspondence) 98 3.11 

SENEWSAR (newspaper articles) 86 3.02 

SEDEPOS (depositions) 58 3.00 

SENEWSED(news editorials) 81 2.94 

SEPOLRPT (police reports) 77 2.93 

SELETROG (letters rogatory) 58 2.88 

SETRNG(training manuals) 49 2.85 

SECASHST(cash statements) 56 2.83 

SELEGAL( legal documents) 70 2.70 

SEDIPL(foreign diplomatic 47 2.70 

SEFCI(FCI reports) 49 2.61 

SETECH( technical articles) 53 2.43 



The self -rating most frequently chosen was COMPETENT, 
except in the case of technical documents, where an equa] number 
of respondents chose FUNCTIONAL as their self -rating. News 
articles, editorials and correspondence were the document types 
most frequently chosen. 
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CORREIATIONS WITH OVERALL SCORES 



The table below presents the correlations of each document 
type with the overall scores for Expression and Accuracy. The 
number of paired scores is listed in parentheses below each 
correlation: 



DOCTYPE 


EXPFl 


EXPF2 


ACCFl 


ACCF2 


SENEWSAR 


0.30 


0.22 


0.50* 


0.46* 




(37) 


(36) 


(37) 


(36) 


SENEWSED 


0.27 


0.22 


0.57* 


0.51* 




(35) 


(34) 


(35) 


(34) 


SEDEPOS 


0.57* 


0.40 


0.73* 


0.72* 




(25) 


(24) 


(25) 


(24) 


SEPOLRPT 


0.43* 


0.30 


0.56* 


0.56* 




(33) 


(32) 


(33) 


(32) 


SECORRES 


0.41* 


0.27 


0.59* 


0.64* 




(42) 


(41) 


(42) 


(41) 


SELEGAL 


0,43* 


0.20 


0.55* 


0.50* 




(30) 


(29) 


(30) 


(29) 


SELETROG 


0.51* 


0. 39* 


K54* 


0.62* 




(25) 


(25) 


(25) 


(25) 


SECASHST 


0.39 


0.21 


0.52* 


0.50* 




(24) 


(24) 


(24) 


(24) 


SFFCT 


V/ • 3 J " 






U • D / * 




(21) 


(21) 


(21) 


(21) 


SETECH 


0.54* 


0.23 


0.50* 


0.42* 




(23) 


(22) 


(23) 


(22) 


SEDIPL 


0.64* 


0. 38 


0.73* 


0.74* 




(20) 


(19) 


(20) 


(19) 


SETRNG 


0.48* 


0.24 


0.53* 


0.66* 




(21) 


(21) 


(21) 


(21) 



*p<.05 

On Form 1, the documents showing the highest correlations 
for Expression were, in descending order: foreign diplomatic 
reports, depositions, technical manuals, letters rogatory and FCI 
reports. On Form 2, only letters rogertory showed any significant 
correlation, which was less than 0.50. By comparison. Accuracy 
total correlations were both higher and more frequent. 

On Form 1, the documents showing the highest correlation for 
Accuracy were, in descending order: foreign diplomatic reports 
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and depositions ( with the sane correlation of 0.73 )? FCI 
reports, correspondence ^ news editorials, and police reports- On 
Form 2, these documents were foreign diplomatic reports, 
depositions, training manuals, correspondence, letters rogatory, 
FCI reports, and police reports. 

The magnitude and the order of the correlations for each 
type of translation task was almost identical across the two 
forms, suggesting that the two forms are consistent in their 
criterion-related validity. 
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APPENDIX O 



CONVERSION TABLES: RAW SCORE TO TSL SCORE 
EXPRESSION AND ACCURACY 
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Conv<i:rsion Tabl€ 
ZKPXmiQn Rav Sggrff TSL score 



Forn 



1 
X 




* 




















7 
# 




o 












11 




12 




13 








1^ 






n A 


17 


U • 9 


1 A 




IQ 


V • 9 


20 


u • o 


21 


o i; 


22 


0 7 


23 


O 7 


24 


0 fi 


25 


0 fi 


26 


0 9 

w • 7 


27 


0 Q 

w * 7 


28 


0 9 

V • 7 


29 




30 


1 ft 
X • u 


31 


1 1 

X • X 


32 


X • X 


33 


X • ^ 


34 


X • ^ 


35 


1 3 

X • w 


36 


1 3 
X « 


37 


1 3 
X • ^ 


38 


1 A 
X • 4 


39 


X • ^ 


40 


1.5 


41 


1.5 


42 


1.6 


43 


1.6 


44 


1.7 


45 


1.7 


46 


1.7 


47 


1.8 


48 


1.8 



♦ 1-15 « chance scores 
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Pora 1 - SEVTE 



{ 
\ 




Expre'jsion Raw Score 




? 




49 




1 Q 


; 




50 






/ 




51 






f 




52 




5 o 

* • V 


> 




53 




* • A 






54 




0 1 


: 




55 




9 1 






56 




« • * 






57 










58 




* • «9 






59 




* • <9 






60 




• • t 






61 




* • ^ 






62 




* • 9 






63 










64 










65 




2 £ 






66 




2 #; 






67 




^ • / 






68 




2 *7 


< 




69 




O ft 






70 




^ • o 






71 




o o 






72 




O Q 

4C • 7 






73 




2 Q 






74 




1 O 
^ • V 






75 




«^ • W 






76 










77 




3 1 






78 




3 2 






79 




3 2 






80 




1 3 






81 




3 3 






82 




3 3 






83 




3 1 






84 




3 A 






85 




3 S 






86 




3 *^ 






87 




3 fx 






88 




3 






89 




3 7 






90 




3 7 






91 




7 






92 




3 fi 






93 




3 A 






94 




3.9 






95 




3.9 






96 




4.0 






97 




4.0 






98 




4 . 1 






99 




4.1 


> 

I- 




100 




4.2 
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Expression Raw Scorg 

101 
102 
103 
104 
105 



Form 1 - SEVTE 

TSL Score 

4.2 
4.2 
4.3 
4.3 
4.4 



Conversion Tables 



Accuracy Raw Scora TSL Score 

1 0.6 

2 0.7 

3 0.7 

4 0.8 

5 0.8 

6 0.9 

7 0.9 

8 0.9 

9 1.0 

10 1.0 

11 1.1 

12 1.1 

13 1.2 

14 1.2 

15 1.3 

16 1.3 

17 1.4 

18 1.4 

19 1.4 

20 1.5 

21 1.5 

22 1.6 

23 1.6 

24 1.7 

25 1.7 

26 1.8 

27 1.8 

28 1.9 

29 1.9 

30 1.9 

31 2.0 
'j2 2.0 

33 2.1 

34 2.1 

35 2.2 

36 2.2 

37 2.3 

38 2.3 

39 2.4 

40 2.4 

41 2.4 

42 2.5 

43 2.5 

44 2.6 

45 2.6 

46 2.7 

47 2.7 

48 2.8 
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Accuracy Raw Scoi:^ 

49 

50 
51 
52 
53 
54 
55 
56 
57 
58 
59 
60 
61 
62 
63 
64 
65 
66 
67 
68 
69 
70 
71 
72 
73 
74 
75 
76 
77 
78 
79 
80 



Forns i - SEVTE 

TSL ggtffg 

2.8 
2.9 
2.9 
2.9 
3.0 
3.0 
3.1 
3.1 
3.2 
3.2 
3.3 
3.3 
3.3 
3.4 
3.4 
3.5 
3.5 
3.6 
3.6 
3.7 
3.7 
3.8 
3.8 
3.8 
3.9 
3.9 
4.0 
4.0 
4.1 
4.1 
4.2 
4.2 
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Form 2 • SEVTE 

Conversion Table 

Expression Raw Score TSL Score 



1 


* 


2 


* 


3 


* 


4 


* 


5 


* 


6 


* 


7 


* 


8 




9 


* 


10 




11 




12 


* 


13 


* 


14 




15 


* 


16 


0.6 


17 


0.6 


18 


0,7 


19 


0.7 


20 


0.8 


21 


0.8 


22 


0.8 


23 


0.9 


24 


0.9 


25 


1.0 


26 


1.0 


27 


1.0 


28 


1.1 


29 


1.2 


30 


1.2 


31 


1.3 


32 


1.3 


33 


1.3 


34 


1.4 


35 


1.4 


36 


1.5 


37 


1.5 


38 


1.6 


39 


1.6 


40 


1.7 


41 


1.7 


42 


1.8 


43 


1.8 


44 


1.8 


45 


1.9 


46 


1.9 



* 1-15 = chance scores 
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Expression Raw Score 

47 
48 
49 

50 
51 
52 
53 
54 
55 
56 
57 
58 
59 
60 
61 
62 
63 
64 
65 
66 
67 
68 
69 
70 
71 
72 
73 
74 
75 
76 
77 
78 
79 
80 
81 
82 
83 
84 
85 
86 
87 
88 
89 
90 
91 
92 
93 
94 
95 
96 
97 



Form 2 ~ SEVTE 

TSL Score 

2.0 
2.0 
2.0 
2.1 
2.2 
2.2 
2.3 
2.3 
2.4 
2.4 
2.4 
2.5 
2.5 
2.6 
2.6 
2.7 
2.7 
2.8 
2.8 
2.9 
2.9 
3.0 
3.0 
3.0 
3.1 
3.1 
3.2 
3.2 
3.3 
3.3 
3.4 
3.4 
3.4 
3.5 
3.5 
3.6 
3.6 
3.7 
3.7 
3.8 
3.8 
3.9 
3.9 
4.0 
4.0 
4.1 
4.1 
4.1 
4.2 
. 4.2 
4.3 
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Express ion Raw Scpif. TSL Score 

98 4.3 

99 4.4 

100 4.4 

101 4.4 

102 4.5 

103 4.5 

104 4.6 

105 4.6 



Pom 2 - SEVTE 
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Conversion Tabl«s 



Accuracy Raw Score TSL Score 

1 0.2 

2 0.3 

3 0.3 

4 0.4 

5 0.4 

6 0.5 

7 0.5 

8 0.6 

9 0.6 

10 0.7 

11 0.8 

12 0.8 

13 0.9 

14 0.9 

15 1.0 

16 1.0 

17 1.1 

18 1.1 

19 1.2 

20 1.2 

21 1.3 

22 1.4 

23 1.4 

24 1.5 

25 1.5 
. 26 1.6 

27 1.6 

28 1.7 

29 1.7 

30 1.8 

31 1.9 

32 1.9 

33 2.0 

34 2.0 

35 2.1 

36 2.1 

37 2.2 

38 2.2 

39 2.3 

40 2.3 

41 2.4 

42 2.5 

43 2.5 

44 2.6 

45 2.6 



Form 2 - SEVTE 
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Accuracy Raw Score 

46 
47 
48 
49 

50 

51 

52 

53 

54 

55 

56' 

57 

58 

59 

60 

61 

62 

63 

64 

65 

66 

67 

68 

69 

70 

71 

72 

73 

74 

75 

76 

77 

78 

79 

80 



Fom 2 - SEVTE 

TSL Score 

2.7 
2.7 
2.8 
2.8 
2.9 
3.0 
3.0 
3.1 
3.1 
3.2 
3.2 
3.3 
3.3 
3.4 
3.4 
3.5 
3.5 
3.6 
3.7 
3.7 
3.8 
3.8 
3.9 
3.9 
4.0 
4.1 
4.1 
4.2 
4.2 
4.3 
4.3 
4.4 
4.4 
4.5 
4.5 
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APPENDIX P 



MEMORANDUM ON TOTAL SCORE CONVERSION 

TO 

FBI/CAL EQUIVALENCY RATING 



1^5 



Nemo 

From: Charles Stansficld 
Date: May 15, 1990 

Subject: Total score conversion to ILR squivalsncy rating 



As I indicatsd to you cn the phone, ve have •ncountered a 
problea in converting the total score on the test to lu^-iike 
Translation Rating. Each examinee took two forms of the test and 
each exaninee was given an overall IZJt*like rating by each of two 
raters based on the examinee's performance on each test. The 
raters assigned ratings for Accuracy and Expression. Thus, each 
examinee received four estimates of his ILR level (estimates per 
form) for accuracy and four estimates of his ILP level for 
expression. 

We averaged the four estimates of ILR rating to come up with 
an overall Translation rating. ^ We then correlated the test scores 
with the Translation rating. The high correlation (an average of 
.90) allowed us to use the resulting regression ecjuation to predict 
Translation rating from the total score on the test. Thus, we were 
able to construct a score conversion table for all points on the 
test scale which would produce an estimated Translation skill 
level • 

One of the problems with such conversion tables is a 
phenomenon known as the "regression effect" (different meaning fror 
the use of regression above). The regression effect means that 
examinee's whose first score is far from the mean will be predicted 
to be closer to the mean on the second score. Thus, most examinees 
whose score on our test is at the top of the distrit tion will be 
predicted to have a lower ILR score than they received from the 
raters. Similarly, most examinees whose score on our test was at 
the bottom of the distribution were predicted to have a higher ILR 
score than they received from the raters. 

Attached is a copy of the scatterplot for 42 FBI examinees . 
The ILR expression rating is on the vertical axis, while the tocal 
expression score on our test (ESVTE) is on the horizontal ajcis. 
We have drawn in the regression line with a pencil. This is the 
straight line that best fits the distribution. For any othi»r line, 
if you calculated the deviations produced by comparing obtained 
scores with the predicted scores, the sum of the deviations fror 
the regression line would be greater. 

On this scatterplot each A represjents one examinee. Each B 
represents two examinees. As indicated in the note at the bottor, 
14 examinees* scores are not on the scatterplot because their 
scores and the regression line coincided. Thus, for these 
examinees, the conversion table worked perfectly. The asterisks 
are the computer's representation of the regression line. In this 
scatterplot you will see some tendency for the deviations between 
the actual and predicted score to be quite small near the center 
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3% 



of the distribution, And ltrg«r at the ends. You will «i»o see 
»OBe tendency for exaainees who scored above SO on the ESVTE to 
have a predicted score that is lower than their obtained score 
SiBilarly, for exaninees who scored below 40, the predicted score 
is usually higher than the obtained score. Thus, sore of the 
obtained scores for these people are below the regression line than 
above it. 

One effect of the regression effect is to lower the range of 
ability Measured by the test. That is, the highest ability 
exaninee on this test obtained a rating of 4.5 but the conversion 
table predicts his predicted skill level to be 3.8. This person 
yas probably one of the three professional translators who took the 
test . 

One option we have, which would reduce the regression effect 
described in paragraph three above is to tilt the regression line 
to the left by transforaing the scores so that the aiaxiBun ILR 
score level is higher, 4.5 for exanple. However, we have no basis 
other than intuition for doing this. That is, the saaple did not 
contain people whon we knew beforehand were at the 4.5 level or 
higher. While this seeas reasonable, in that it reduces the 
regression effect, it also increases slightly the aaount of error 
in the predicted ILR scores all along the continuum. Thus it 
seems unwise. 

Arjother option is to have several people take the test who- 
we know to be level 4* and 5 translators, and enter their results 
into the equation. This would have to be done later, however. So, 
that's our dilenna. As it stands, no one in the saaple would earn 
a predicted ILR rating above 3+, and because of the lack of high 
ability examinees in the sample, it is not possible to earn a 
rating higher than 4.2 on the test, even though we believe it to 
be sensitive to differences in ability in the 4-5 range. Further 
evidence that the test could discriainate in that range is found 
in the fact that the highest raw Expression score on the test was 
98 on the ESVTE and 96 on the SEVTE, while the aaxiaun possible 
total scorft was 105. Siailarly, for Accuracy, the highest raw 
score was 71 on the SEVTE and 75 for the ESVTE, while the aaximur 
possible total score was 80. Thus, the difficulty level of the 
test exceeds the ability level of any examinee in the sample. 

As a future project, we should think about how we can identify 
at least lo high level translators and then administer the tests 
to them. We would then be able to revise the score conversion 
table so that the ILR ratings for high ability candidates are more 
accurate than at present, and so that the test will aeasure ability 
up to a higher level than at present. 

For the moment, it nay be best to leave the conversion table 
as is. However, if this conversion table is used, test score users 
should be aware that it nay underpredict the true levels of 
examinees whose predicted ILR rating is 3.5 or above. This 
information should be incorporated in any test manual that you 
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prepare. 



In general, I find thie aieappointing. Me tried to Bake the 
test hard enough to aeasure ability as high as level S. However, 
because 5'« did not ehov up in the saaple, the test appears to fail 
to aeasure at such a high level. 

On a Bore positive not*, I should aay that the test seeos to 
predict the average Translation skill level rating assigned by our 
raters very accurately between the 1.8 and 3.5 range, which is the 
range in which aost FBI personnel scored. 

I should Mention one acre concern. All of the 17 FBI 
employees on whon we had Translation level ratings on the FBI's 
current translation test received a lower Translation rating on our 
test than on the FBI test. The average difference was about half 
tJ ify* ' differences tyj>ically being larger for exaainees 

Whose FBI test score was 3.8 or above, and being snaller for 
exaoinees whose FBI test score was 2.8 or below. Thus, either a.) 
the FBI s current test is too generous, or b.> our raters are too 
severe, or c.) the time constraints on our test do not per»it the 
exaninees to revise their translations and deaonstrate their true 
ability, or d.) the examinees were not aotivated to give their best 
perforiDance when they took our test, or «.) the exaoinees* true 
Translation ability declined subsequent to taking the FBI test 
Do you have any thoughts about a.) or e.) above' 
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ENSP Pora 2: EXPILK12 Predicted froa •xptotf2 se 

13:57 Tusftday, May 15, 1990 

Plot of EXPILRl2*EXPTOTr2. Lagand: A • I obo, B • 2 ob», ate. 
Plot Of PRE0*EXPT0TF2 . Sysbol uaad is 



:xpiuti2 I 

4.5 ♦ 



4.0 + 



3.5 + 



3.0 + 



2.5- + 



2.0 ♦ 



1.5 + 



1.0 * 



0.5 + 




jTZ: 14 obs hidden. 
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APPENDIX Q 



SURVEY 
OF 

FBI TRANSLATION NEEDS 
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Dear Language Specialist. 



/r w^^j^^^f Services Unit has contracted with the Center for Applied Linguistics 
Inth^i? v!/^ °P ^J'r ^ translation test, Spanish into English and English mto 
^nnn <t .y^L'J'®" ^'}^ to develop a new test which tests more closely for the actual 
hngujstic tasks earned out by Language Specialists. Therefore, we would really 

froo tn fnw^®"' '"P"^- •""'^[y- attached questionnaire, fee! 

T S?c- * comments you think are pertinent. Please note that "% OF YOUR 
T ME refers to the percentage of time that is deA)ted to the listed tasks when vou 

mi n h f '«"9Udge specialist may devote roughly Ralf of h.s 

iZV^ his Span.sh-languaae work to interpretation assignments, but his work with 
the Spanish language itself might constitute only a frection of his entire w^^kSry 

If an item does not apply to you. put 0 % in the appropriate column As concerns 
he other (p ease specify) listing, please note that wi» are interested only in tasks 
Icf m ^!^90\ar basis There is no need for you to list any 

assignment that was performed once or that is performed only rarely . 

fiVfi"!] completed questionnaires to me as soon as possible (Bureau ma.l) 
an addressed envelope has been attached for this purpose. lueauma.i;. 

you so mi^h for your help 




larijke Walker 
Testing Program Manager 
Language Services Unit 
FBIHQ, Room 3505 



Phone HOx4l60 
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FROM ENGLISH TO SPANISH 



i. ORAL TASKS 

% OF YOUR TIME 

Interpretation Assignments 

Check as many asare applicable 

unarnounced visitors 

tours 

2^ conferences 

othe^ (plea'.e specify) 



Oral Proficiency Test (Spanish) 

TASKS INVOLVING WRITTEN MATERIAL 



% OF YOUR TIME % OF YOUR TIME 

TRANSLATING SUMMARIZING 



Legal Documents 

Check as many as are appl>cable 

letters rogatory 
eitrad tion requests 

laws.v olattons/iega* rights 

wa'^ted posters 

othe' (please spec fy) 



Booklets Manuals 

Check as many as are applicable 

science technology 

tou^s 

train ng 

othe' (p ease spec'S) 

Forms 

Check as many as are apolicable 

Bjreao forms 

OOj fo^^ ^ 
cthe^ (pleas? specfy) 

Oth er (p leas e specify) 



202 



8r- 



% OP YOOR nXE % OF VODR TSfE 

SPCKT Dl TRMISIAT»6 SRMT ZN SUMKMUZIKG 



»»cordtd ConvT«ation«! 



I TEIZFHONE 



CNtCK AS NANT AS AtC AtHICAlU: 

politics 
_ kutirms/f ifittxt 

_ ftntral thfft/HhWt c«U«r cri«e 

fortifn CMMtrinttlUterKt 

»c i t 1 tchno Ipgy 

•Uittry 

legal 

theft 

t*fltlif>5 
COiF^tttffiting 

proc«0j^ti/«ppoir>t»e<^t& 
p»ywrntt/pu''Ch«e» 
tipt a^^t ior>& 
other (pttft&e specify) 



BODY RECORDER 

C«t£:K AS MAiiT AS AIE APP«.i:a6.E: 

pci 1 t ICl 

bu&ine&s/f lr^«nce 
tcorvomtcs 

ft'*''*! theft/«^itt cell*' crt^e 

Orginited cnmt 

ra^'cotici trtffickirv9 

d: ttiC/ intfrnat lOrNj; tfiTOriSfr 

foreign cou^ttrintt'i ( ige^e 
tcifnce/ttchnologr 

rMltfy 

(eg*i 

theM 

^ ganeUng 

C0i#Mtrf< inQ 

k»dr>gpping 

^ P»'e<tdurtt/appo»f>t»»e^ti 

pa/Mtr>ti/purcht&ei 

ttptgf>itieni 
©thf (pltase specify) 



Other (please specify) : 



— " C'^ 



% GT YCXJR TME 
SPDIT IM TRANSIATWC SPDIT IN SUMKMUZING 



Medical imports . 

CnECc as NAMT as Atl AF^lKCAIlC: 

^^^^ outopciM 

other (pltatt AptcHy) 



Pater.ts 



Other (please specify) 



TV. 



TASKS INVOLVTKG LISTENING 



% OF YOUR TIME 
SPENT IN TTV^SIATING 



% OF YOUR TIKE 
SPENT IN SUMMARIZING 



Broadcasts ; 



CHECK AS HXkr AS AtC APP«.ICAt.C: 



pc t > ! t c i 

n«''COtiCi ti'tff iCkir>Q 
foreign cftj^terinteU igence 
• ' 1 1 tiry 

ttgtl 

Other (pienit specify) 
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t OV YOUR ytKE ft OP YOUR TIME 

SFt^ XN TRANSIATZMG SPEKT ZN 5U)fXMaZING 



DDmestic/IntmmatioT>al Terrorism 

CMCCK A$ MHT A$ Ate M^llCA|;C: 

- _ ttatwf •nd •valuation rtporti 
... caff hittoriet 

police records 

cou't rtcordi 

travel doci^eots 
other (plca&e specify) 



Foreign Counterintelligence 

C^'CCC AS lUfc* AS Aft APPLlCAfi.C: 

«tatui •'>d evaluation repcts 

. Mie'ia; on 

ifttelhfence cwwn^icat i avthods 
_ caie ^1fttor1es 

-■ fK:»cei of atiigortw-t p# d»p.a-^:s 
_ othe- (p.eaie »pec»fy) 



Treaty Pequests /Letters Rogatory 
Scientific/Technical 

CHtrr AS iak' AS Ate r^P^lCAt.E: 

. c^e*^iitry 

. b^oloor 

^^'^erpr^ntif^/DiiA typing 

computer techrvclogr 

■ explosive ar^ ir>ce^d>ary dfvices 

^ weaporu 

•otoftotMet f>c ot*»er vehicles 

other (plea&e »p*ci«y) 
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% OF yoUR TIKE 
SPENT ZN TRAN5IATZMG 



% or YOUR TIME 
SPENT IN SUMMARIZING 



letters to the Dirgcl; or 
end other FBI cfficia^s! 

Teletypes : 

(TtAHSlATIOM ttlLT) 

Legal/Technical! 

General Tb^ ft/White Collar Crime 

CtiCCC AS lUKY AS An APPilCAS.E: 

t#*^ r«cofds 

_ police rtpo^-ts 

, Othff ipiwMst Specify) 



Orcar.ize:! Crirre 

ChE:c as iU^T AS At£ APP.ICAB.E: 
pclicc rtpo*-!! 



Narcotics Trafficking 

CNfCf AS WHT AS At£ APPlICAB.C: 

bft*^ rtcc'ds 

■ pciice rtpcts 

court rtco^ds 

©thff (plfAie tp^ClVy) 



QUESTIONNAIRE TO DETERMINE THE FBI'S ITANSIATION NEEDS 



eOM SPANISH INTO ENGLISH 

% OP YOUK TIME 



ORAL TASKS 

Interpretation Assignments : 

C( AS MAMY AS AKE APPLICABLE: 
_ot^»e' (pltAse tpectfy) 

Oral Proficiency Examinations : 



% OF YOUR TIME 

GRADIKG OF FOREIGN LANGUAGE EXAMINATIONS 

I. TASKS INVOLVING WRITTEN MATERIAL 

% OF YOUR TIME % OF YOUR TIME 

SPENT IN TRANSIATING SPENT IN SUMMARIZING 

wspapers/Maqazines : 

:r AS MAiiT AS *»£ APP.ICAB.E: 

_ ectcia's 

_ politics 

busmcs s/^ ma'\ce 

^ eco^o»r»c$ 

_ ge'^e^al the* t/»<»- • te cc: t a- crime 

^ crga" » 2ec cr ir.c 

_ njtcctics traffickT.; 

^ domest 1 c/1 nterrvit lona ; te-rcis^ 

_ foreign covr>terinreM 1 9e'>ce 

_ »ciertce/tfchrx>l08/ 

_ military 

_ lajil 

_ other (please tpecf^) 
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VuEfTiOHNAiiiE RESULTS 
TOTAL NUMBER OF RESPONDFNTS; 28 

(Averages were calculated based on noaber of respondents to each 

question; 0% answers were not factored In unless all answers were 
0) 

ORAL TASKS 

Interpretation A^sifnaents 

NuBber of respondents; 19/28 

Average % of tlae spent 4.9% 

The Bost frequent category checked by respondents was 
"unannounced visitors." Under "other/ respondents listed tasks 
such as interviewing suspects, handling coaplaints, and 
debriefing Inforaants, witnesses and subjects. 

Oral Proficiency Exaain&tiofn 

Nuaber of respondents: 1/28 
Average % of lime spent 1.0% 



GRADING OF FOREIGN LANGUAGE EXAMINATIONS 

NuBber of respondents: 1/28 

Average % of time spent 70.0% 



TASKS INVOLVING WRITTEN MATERIAL 
Newspapers/Magazines 

% <fi ttP? % of tiae 

spent translating spent suaaariziny 

23.3* 21.0% 

Nuaber of Nuaber of 

rc^PPn^enjjS. _ respondents 

12/28 5/28 

The categories aost chosen by respondents were politics, 

narcotics, terrorlsa, foreign counterintelligence, legal, theft, 

and organized crlae. The other categories were seldoa chosen. 
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Letters to the Directo r 
and other FBI officials 

% ot time 



nt,int translating 



% 9t iimt 

spent auMarlzinr 
2% 



Nuaber of 

respondents 
4/28 



NuBber of 
respondents 

1/28 



spent translating 
1.0% 



NoBber of 

respondents 
1/28 



gpent suBaarizinr 
0% 

Nuaber of 
reapondents 

0/28 



Legal/Technical 

General Theft/White Collar Crl«e 

% of tiae 
spent tr anslating 

Nuaber of 

respondents 
12/28 



% O f 

spent syarlzing 

tlx 

respondent g 

2/28 



All categories were chosen by respondents. Under "other," 
translation of letters was indicated, as well as translation of 
affidavits and signed stateaents. These "other" iteas were 
repeated throughout this section. 
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Organized Crlae 

spent transUtlnf 
8.1% 

respondents 
9/28 



% fff tiMC 

tpgni fiBtrUlni 

Wuaber of 

respondents 
1/28 



The category aost frequently chosen was "police reports. 



Narcotics Trafficking 

% ot ti»e 
spent translatinf 
17.1% 

Wuaber of 

respondents 
15/28 



% 9t tfg 
spent suB»arl2lnf 

37.5% 

NuBber of 
respondents 

4/28 



The category uost frequently chosen was "court records." Under 
"other," translation of letters and ledger (log) notes was 
Indicated, as were T-III and T-IV translations. 



Doaestlc/Internatlonal Terrorise 

spent translating 
13.2« 



I Oi tlMC 

spent suMarlzing 
25.5% 



respondents 
10/28 



Nu»ber of 

respondents 

2/28 



The aost frequent responses were "case histories" and "court 
records." Aaong "other" responses was translation of 
coMuniquis. 
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Foreign Counterintelligence 

i 9t tmjS. t ct time 

tpent trmnslatini «pent auMarlzing 

18.«» 24.4% 

WUMber g£ Mu«ber of 

respondents respondents 

18/28 7/28 

The category aost frequently chosen was "status and evaluation 
reports." Under "other," categories listed include political and 

■ilitary intelligence and defectors' reports. 

Treaty Requests/Letters Rogatory 

* P( ttee % of ti»e 

SPftnt trtnsUt i nf spent snwarlzlne 

.75% 0 

Nuiber of Nuaber of 

respondents respondents 

2/28 0/28 



Scientific/Technical 

fL9t ttpe % of ti»e 

spent transUtinr spent syartzlng 

12% 0 

Nuaber of Nuaber of 

respondents respondents 
6/28 0 

The categories Host frequently chosen were explosive and 
incendiary devices, weapons, and autoablles and other vehicles. 

Flngerprlntlng/DNA typing and coaputer technology were seldoa 
chosen. 
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Medical Reports 

apcnt trinsUt l nr spent «owTt2tnff 

3.9% 0 

Nui^gr Nouber of 

respondents respondents 

8/28 0 

"Other" responses Include aedical reports to be used as evidence, 
progress reports, and hospital reports. 

Patents 

Nutber of Number of 

rfSP0T*j9Jl}9 respondents 

0/28 0 

QikSiX. (Respondent listed police reports and ownership/sale 
docuaents). 

* Of t l ie % of time 

spent trans l tt l ng spent s»««arl2lnf 

2% 0 

Nvbcr of Nuaber of 

respondents respondents 

1/28 0 
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TASKS INVOLVING LISTENING 



I 9t tlMC of tlae 

apcnt trins l it l nt gpcnt «ii«««r<ziny 

44.2» T3* 

NUBber of Nuaber of 

rcSPPntfcnta respondents 
10/28 6/28 

The aost frequently-chosen catofory is "narcotics trafflckinf," 
Business/finance, econoalcs, sclence/technolof y, allltary, and 
legal were cbosen seldoa, If at all. "Other" tasks include radio 
transaissions and sbip-to-shore, s.t»ip-to-ship broadcasts. 

■onltorint of Lly Convaraattona 

Telephone: 

» t i MC % of tl»e 

Spent translating soent suaaarlzlny 
33.5* 25.6% 

WuBber of Nu»ber of 

regponclgntS respondents 
21/28 19/28 

Categories aost often chosen Include t^jef t/wbite collar criae, 
organized criae, narcotics trafficking, terrorisa, and 
counterintelligence. The other categories were seldoa cbosen. 



Body Microphone: 

% t i ^g % of tiae 

SPgnt tranglatiat spent suaaarlzing 

21.8% 30.6* 

Nupb^r 9f Nuaber of 

respon<feaL& respondents 
18/28 8/28 

The itea cbosen aost often is narcotics trafficking. The otber 
iteas on the checklist were seldoa chosen. "Other" responses 
included aicropbone surveillance of live aonltoring, Title III 
Live aonltoring, TIV, and rooa ("bidden") aikes. 
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■•cord«d Converaatloni 



Telephone: 



* of tiae 
«pcnt transUtlnf 
38.7X 

NuBber ol 
rcspondi>nt« 
27/28 



% Of UU9 

Spent lUMarlzlny 
60.9% 

WuHber of 
respondents 
14/28 



The iteas aost frequently chosen are the taae as those for live 
conversations. The Individual participants sees to have a wider 
range of experience with recorded rather than live aaterial. 



Body Racordar: 

spent tra nslating 
25.0% 



NuMber of 

respondents 
26/28 



% cf tmg 

spent syarizing 
32.0% 



Nunber of 

respondents 
9/28 



Otter; (Answers included pretext calls and consensual 
recordings) 

* Of ttie * of tiae 

gpent tr*n$llttlnt spent soaaarizing 

9.0* 27.8% 

NuBber of Nuaber of 

respondents respondents 

6/28 4/28 
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SECOND QUESTIONNAIRE: QUES?IONNAIRE TO DETERMINE FBI'S 
TRANSLATION NEEDS 



ORAL TASKS 

Interpretation Ai«l>n»«>nt< 

NuBber of respondents: lS/28 
% of tlae spent s% 

The category aost often chosen Is "unannounced visitors." A 
frequent catefory listed under "other" Is Ustenlnf to tbree-way 
phone calls. Other cateforles Include field Interviews of 
witnesses and polygraph exaalnailons. 

Oral Proficiency Tett 

NuBber of respondents: 1/28 
% of tiae spent 4% 



WRITTEN TASKS 
Legal DocuBents 

^ % of tlBe 

apgnt trans l ating spent suBaarlzlny 

15* 10.5* 

NuBber of 

respondents 
2/28 

All categories were checked, but "extradition requests" was 
chosen very Infrequently. "Other" categories listed Include: 
polfce reports, depositions, foreign consulate reports, and 
stateBents. 




Boofclets/ManuaU 



las % 9t t lBg 

5Pent translating spent suBaarlzlng 

11.3« 5* 

^VBbgr 9t NuBber of 

fespopd^tpts respondents 

6/28 1/28 

"Training Bsnuals" and "science/technology" were the IteBS Bost 

often chosen. 



ERIC 
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spent translating 

lit 

Nuaber of 

respondents 

3/28 

"Bureau foras" was checked aost often. 

% of tiig 
spent tr anslating 

Nuaber of 
respondents 

2/28 



% 9t iimt 

spent anaaarlztnf 
i% 

Nuaber of 

res pondents 

2/28 



% 9 t U 9^ 
spent suaaarlzlnf 

0 

Huaber of 

respondents 

0 



"Other" responses Include correspondence and press releases. 
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RFP STATEMENT OF WORK 



SE CT I ON C - D»sgrlPtion/Sp«e«./WorV stAti« | >^ nt 

A. «!• following r«quir«»«it* and ooala aust be 
■at by tha offaror: 

1. £uis&i«i 

a. Tha davalop^J tranalation ta»t will ba 

^ ^ tranalations aWOla 
of individuala. 

b. Currantly tranalation aJcilla ar« 
tastad by aaana of writtan tMta, 
%jhich ara to ba tranalatad varbati« 

f roa tha fbraign languaga Into English 
and frcsD English into tha foraign 

n»a various tasts vary In 
difficulty as wall as in fora and typa 
of oontant. IXia to tba t«st for» and 
lac* of claar, standardiiad scoring 
critaria, tha scoras tand to lack 
conaistancy and hansa, reliability, 
tha tasts lack sooa ccntant validity, 
because they fail to aeasura summary 
translation skills fron audio stimuli. 

c. Th» contractor is to provide scoring 
criteria basad on, and consistent 
with, tha Interagency Lanouaaa 
Joundtabla (IIR) level d^crlptions, 
yith a scale fro« 0 to 5. (s5e 
Attachment p for a ©opy of tha XUl 
level descriptions for speaking, 

test should be constructed in such a 
to facilitate aasy, but finely 

f5 i5l!J5f t^^^' perhaps by »aans 
of specified point penalty for 
categories of errors, :q, 
mistranslation, grammar, word choice, 
•tyla, etc., with an exact easy to 
aj^ly notation system, jhi^ would 
ultimately result in a score which can 
be converted to the 0 through 5 scale. 
A rating sheet to register error types 
and calibrations will be helpful for 
this purpoee. 
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lh« d«v«lop«l translitlon t«st should 
consist of an audio stlaulus to tast 
susmaxy translation skill up to laval 
3, to astablish a floS, p!Ss a 
vrittan stiaulus to tast full, 
yarbatia translation skills batvsan 
lavals a* and 5, to astablish a 
cailinj. Ibara should ba at laast one 
altamata varsion of tha tast for 
ratasting puxpoaas. 

Thm contractor vill ba abla to soae 
axtant draw on tha axpartisa of tha 
»a«tar translators in tha FBI, and 
paraonnal fro« tha FBI could alao ba 
usad for tha audio portions of tha 
tast if dasirad. 

Tha danirad output ahould includa a 
■odal 4\nd altamata in Engliah, and 
Spanish taat plus an altamata, anS 
PoaaiDly additional tasU in othar 
languagss, all of vhich ahould hava 
bean fiald-tastad to provlda 
quantifiabla data ragarding 
raliability, validity, adiainlstrative 
aasa and acorability' 

Upon coBsplstion of tha contract the 
contractor vill provide written 
instructions for the grading of tha 
testa and if necessary a training 
session. ^ 

All Mterials generated durl^ the 
course of the research, incJudina 
notea and rough drafts, are to be 
turned over to the FBI. 
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Dtllvrmbl«« 

Thm folloving ar& r*quir«d to ba 
fumiah«d: 

«. Konthly progr««s rtporU 

b. Trajwlation •Jcill l«v«l d«iM:ription« 

c. Audio c««Mtt«« with or«l recordings 
of stlauli and «ppropri«t« 

docuaentation: 

(1) ona plus «n alUmata In English 

(2) on« plus an altamata in Spanish 

f . Hard corpias of writtan atiauli and 
appropt-iata docuaantation: 

(1) ona plus an altamata in English 

(2) one plus an altamata in Spanish 

g. Grading procedures, rating sheets and 
appropriate training »anual 

h. Thrae days of training at rei, loth 
snd Pennsylvania Avenue, H. W. 
Washington, D. C. 
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